Happy New Year everyone! If you have already recovered from the festive mood and feel like catching up, have I got some stuff for you - starting with a recap of the state of play in genAI:
https://nrehiew.github.io/blog/2024/
APPLICATIONS
Let’s get this out of the way: LLM for time series are more dead than Keynesian economics:
Time series LLMs have failed to achieve universal applicability across datasets due to the fundamental requirement for models to adapt to specific data characteristics. In human language: images from different domains are much more alike than time series gathered from all over the place.
A simple statistical benchmark called SCUM (no typo) has outperformed Google model, a massive LLM, in time series tasks.
The findings emphasize that targeted, domain-specific approaches are more effective than universal solutions for time series problems
BERT is back:
ModernBERT is BERT's successor with substantial improvements in retrieval, classification, and code understanding tasks, featuring a 8192 token context
The model comes in two sizes (139M and 395M parameters) and is designed as a direct drop-in replacement for existing BERT or RoBERTa implementations
ModernBERT is the first encoder model to incorporate extensive code training data, enabling advanced applications
The model emphasizes efficiency through optimizations like flash attention, RoPE embeddings, and alternating attention
Released under Apache 2.0 license and available through HF Transformers
Announcement: https://huggingface.co/blog/modernbert
HF model page: https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb
Paper: https://arxiv.org/html/2412.13663v2
And shortly after the release, Nomic AI has finetuned the beast into an embedding model for search, classification, clustering and more:
Based on ModernBERT-base with 149M parameters.
Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex,
HF model page: https://huggingface.co/nomic-ai/modernbert-embed-base
Salesforce has launched the second version of their AI system Agentforce, which aims to automate workplace tasks through what they're calling "digital labor." At its core is the Atlas Reasoning Engine, which enhances the AI's ability to tackle complex problems and make autonomous decisions. The technology has shown results within Salesforce itself, where it has automated majority of customer support inquiries. The company's CEO, Marc Benioff, positions this technology as a solution to workforce shortages (not sure where tf did he notice shortages, but what do I know? I’m just a europoor).
HF strikes again: Synthetic Data Generator is a no-code app that enables users to create custom datasets through natural language instructions - all in a three-step process: describing your dataset, configuring options, and generating the final output.
Announcement: https://huggingface.co/blog/synthetic-data-generator
Playground: https://huggingface.co/spaces/argilla/synthetic-data-generator
European Data Protection Board is yet another European body concerned with regulating the tech industry. Too bad they don’t care at least half as much about innovation, but hey - at least we know have someone watching over us, right? To wit:
AI model developers must provide concrete technical evidence of proper anonymization and anti-reidentification measures, rather than just claiming their models don't process personal data. Ok, fair enough - relying on the financial industry to self-regulate is what gave us the GFC.
Orgs can use legitimate interest as a legal basis for AI development, but must demonstrate legitimate purpose, necessity, and respect for data subject rights.
Supervisory authorities have extensive powers to correct issues with AI models trained on unlawfully processed data, including requiring complete deletion or retraining of models. LOL good luck enforcing that last one.
Read the whole thing, if you’re brave enough: https://www.edpb.europa.eu/system/files/2024-12/edpb_opinion_202428_ai-models_en.pdf
Cohere is at it again:
Command R7B has been released as the most compact and efficient model in the R series, offering enterprise-grade capabilities including multilingual support, verified RAG, reasoning abilities, and tool integration.
The model features a 128k context window while maintaining competitive performance among lightweight models, with the ability to run on basic hardware including low-end GPUs, MacBooks, and CPUs
The model is now accessible through both the Cohere Platform and HF, with open weights being made available to researchers
Announcement: https://cohere.com/blog/command-r7b
GitHub Copilot has a new free tier: 2,000 code completions per month, 50 chat messages per month, models like Claude 3.5 Sonnet or GPT-4o. Read more here:
https://github.blog/news-insights/product-news/github-copilot-in-vscode-free/
Microsoft's GraphRAG has just hit version 1.0 and there are quite a few goodies to unpack:
It is now possible to generates pre-configured settings files, simplifying the setup process by requiring only an OpenAI API key to get started with GraphRAG.
The system features a revamped CLI that reduces startup time
Storage efficiency has been significantly improved through optimized embedding handling
At NeurIPS 2024, former OpenAI Chief Scientist Ilya Sutskever reflected on his influential 2014 paper that established the foundational principles of modern LLMs, while declaring that this era of pre-training on internet data is ending due to reaching "Peak Data." He outlined the next frontier of AI development, including agents, synthetic data, and improved reasoning capabilities, suggesting that future AI systems will make an evolutionary leap analogous to early hominids - moving beyond pattern matching toward true agency, reasoning, and self-awareness. Don’t let this MuH EmERgEnT PropErTIeS pile of shite discourage you - there are actually several good parts in the talk.
BUSINESS
If there are people who still believe Apple has a better track record wrt privacy than Google or Microsoft, I have some bad news: https://www.theaireport.ai/articles/apples-shocking-siri-blunder
I have seen quite a few definitions of AGI, but “returning 100bln in profit” is next level chutzpah. I am low-key impressed: https://www.theverge.com/2024/12/26/24329618/openai-microsoft-and-the-100-billion-agi-question
At the end of the day, money talks:
OpenAI is restructuring to become a Public Benefit Corporation (PBC), allowing greater capital raising flexibility while maintaining a societal benefit mission, with its nonprofit arm focusing on charitable initiatives in healthcare, education, and science. So let me get this straight: when Musk said Altman et al were full of it, he was actually right?
The new structure gives the nonprofit division shares in the PBC at independently determined valuations, with OpenAI claiming this will create one of the most well-resourced nonprofits while enabling necessary fundraising for AGI development. Have a cake and eat it too, the AI edition.
https://www.aitoolreport.com/articles/openais-for-profit-plan-finally-revealed
DeepMind and Apptronik (creators of NASA’s Valkyrie robot) have partnered to develop humanoid robots for real-world environments. This is a second major collaboration between AI and robotics companies, the first being the Figure + OpenAI partnership.
Not sure what the bloody point is, but the mob went wild: OpenAI has launched a phone service for ChatGPT that allows users to interact with the AI through voice calls or WhatsApp. The service provides 15 minutes of free calling per month, while maintaining a commitment not to use any voice or message data for model training. Pinky swear they won’t.
https://www.aitoolreport.com/articles/you-can-call-or-whatsapp-chatgpt
Acquihire of the year: Google has invested in Anthropic, and is now using Claude to improve Gemini:
Google contractors have been tasked with comparing responses from their Gemini model against Anthropic's Claude AI, though the company claims they were initially unaware they were specifically comparing against Claude.
Evaluators noted Claude's distinctively safetyist bs, observing that it consistently refused to engage with unsafe prompts while Gemini would sometimes provide responses to such queries, which was marked as a significant safety concern.
While Google has invested in Anthropic, questions have emerged about whether their comparative testing violates Anthropic's user agreement prohibiting the use of Claude to develop competing products, though Google has explicitly denied using Anthropic's models to train Gemini.
https://www.aitoolreport.com/articles/google-using-claude-to-improve-gemini
CUTTING EDGE
DeepSeek's V3 model has been open-sourced, featuring 671B parameters while innovatively using only 37B parameters per token through its Mixture-of-Experts architecture:
The model was trained on 14.8T tokens in just 2.8 million GPU hours, resulting in a remarkably low training cost of 6mln USD (low for the domain LOL). For comparison, Llama3 was > 100mln USD mark.
The model was trained on 2048 x H800 GPUs - that’s two orders of magnitude less that the American competition
In performance benchmarks, V3 shows impressive results, scoring 65.2pct on HumanEval Pass@1 (surpassing Claude Sonnet 3.5) and demonstrating strong capabilities in coding competitions where it outperforms larger models like Meta's Llama 3.1 405B. As usual with benchmark info, take it with a bucket of salt.
Though competitive in performance and cost-effective in API pricing, the model has raised attention due to occasionally identifying itself as ChatGPT, suggesting potential training on GPT-4 outputs. Total coincidence, I’m sure.
Model: https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b
Paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
Genesis, developed through collaboration between 20+ AI labs, is an open-source physics engine that combines a VLM agent with 4D world generation capabilities, achieving simulation speeds 430k times faster than real-time and requiring only 26 seconds to train transferrable robotic policies.
Project page: https://genesis-embodied-ai.github.io/
Repo: https://github.com/Genesis-Embodied-AI/Genesis
Falcon is back:
The Technology Innovation Institute (TIIUAE) has released Falcon 3 series, featuring 30 different LLM checkpoints ranging from 1B to 10B parameters
The model uses a transformer-based architecture with 40 decoder blocks and Grouped Query Attention (GQA) with 12 query heads, and achieves strong benchmark results
Falcon 3 supports four languages (English, French, Spanish, and Portuguese) and comes in multiple variants including Base, Instruct, and quantized versions
The model is released under the TII Falcon-LLM License 2.0, allowing for commercial use
HF model: https://huggingface.co/collections/tiiuae/falcon3-67605ae03578be86e4e87026
Meta and Stanford researchers have developed Apollo: a family of video-centric large multimodal models (vLMMs) :
Can efficiently analyze hour-long videos and sets new performance benchmarks in video understanding
Apollo's key innovation of "Scaling Consistency" enables design decisions made with smaller models to reliably transfer to larger ones, significantly reducing computational costs during development
The model employs advanced FPS-based video sampling techniques
Paper: https://arxiv.org/html/2412.10360v1
Repo: https://github.com/Apollo-LMMs/Apollo
HF Space: https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
HF Models: https://huggingface.co/Apollo-LMMs
TL;DR Meta's Large Concept Model (LCM) represents a shift from token prediction to sentence-level reasoning by encoding sentences as vectors in SONAR space, enabling cross-lingual and multi-modal understanding.
Project page: https://github.com/facebookresearch/large_concept_model
NVIDIA's new Jetson Orin Nano Super Developer Kit delivers 67 TOPS at 25W power consumption for $249, featuring 1.7x performance improvement over its predecessor and 50% higher memory bandwidth. The device pairs a 6-core processor with 1024 CUDA cores for parallel AI task processing. It integrates with NVIDIA's Isaac and Metropolis platforms for streamlined AI application development.
https://blogs.nvidia.com/blog/jetson-generative-ai-supercomputer/
AGI is here, if the hype is anything to go by (that’s a humungous f***ing “if”, isn’t it):
OpenAI's o3 model achieved 87.5% on ARC-AGI benchmarks, surpassing average human performance of 85%, using neural-symbolic learning and probabilistic logic with 60M token processing capacity per problem, though at a cost of $1.6M for the full test suite.
The model's validity is questioned due to being trained on 75% of the test dataset, and faces major scaling limitations due to computational demands exceeding global GPU capacity.
Each ARC-AGI benchmark task consumes approximately 1,785 kWh of energy (equivalent to 2 months of average U.S. household electricity) and produces 684 kg CO₂e emissions (equivalent to 5+ full tanks of gasoline), raising concerns about environmental sustainability at scale.
Announcement: https://openai.com/12-days/?day=12
FRINGE
Yeah, so the whole “Dead internet theory”? It seems like we need to move it from “conspiracy theory” to “spoiler alert”: https://www.rollingstone.com/culture/culture-news/meta-ai-users-facebook-instagram-1235221430/
Emerging properties are here again, courtesy of Anthropic:
“In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.”
https://www.anthropic.com/research/alignment-faking
Looking at the video, I wonder - do Anthropic and OpenAI share a recording studio? All this crap looks exactly the same.
According to The Squid (copyright by Matt Taibbi), in 2025 companies will need to manage both human and AI employees - that they both have career paths.
That must have been some really good pills.
Elon Musk went to war over mass immigration (spoiler: he wants more of it), demonstrating that the “free speech champion” shtick has outlived its usefulness. Visual summary courtesy of https://x.com/Partisangirl:
I’m sure it has nothing to do with a massive investment into xAI from Blackrock. The indomitable Majid Nawaz has an excellent summary:
RESEARCH
Imagine360 is a new method that can turn - you guessed it - any video into 360° video. Code coming soon, or so the the Germans would have us believe.
Paper: https://arxiv.org/html/2412.03552v1
Demo:
Continuing with our stroll through dimensions: how do we things move in 3D? Well, there’s a ton of stereoscopic videos online - and you can use them to train Stereo4D. Paper: https://arxiv.org/html/2412.09621v1
A new study reveals how strategic dataset selection can make any model appear groundbreaking, particularly in time series transformer research, where performance claims are often exaggerated through "dataset arbitrage." The findings challenge the widely touted potential of transformers in time series tasks, highlighting a lack of robust real-world evidence. The study calls for more rigorous benchmarking practices focused on practical utility and reproducible results rather than carefully curated conditions.
Paper: https://arxiv.org/html/2412.14435v1
OmniPred demonstrates that language models can function as universal regression tools by processing numerical data and parameters as text, leveraging Google Vizier's extensive blackbox optimization database. When trained at scale across multiple tasks, this approach outperforms traditional regression models, suggesting language models can effectively handle generalized regression problems without task-specific constraints.
Paper: https://arxiv.org/html/2402.14547v4
Project: https://github.com/google-research/optformer/tree/main/optformer/omnipred