This week in tech: 11.08.2025

Summary of AI developments - made for busy people

Aug 11, 2025

APPLICATIONS

Google has launched the Kaggle Game Arena: a new benchmark that evaluates an strategic reasoning by having AI models compete against each other in games like chess using text-based moves. This dynamic competition provides a more transparent and evolving measure of AI progress than static benchmarks, which can be memorized or used in the training process.

https://blog.google/technology/ai/kaggle-game-arena/

Hertz deployed an AI system for detecting car damage and the most polite way to describe ~~the resulting clusterfuck~~ the outcome is thus: it demonstrates why it is so important to always have a human in the loop.

https://futurism.com/hertz-ai-damage-scanner

The Anti-AI Act, a.k.a. the core EU contribution to AI development, went into effect on 2nd of August. For those of you who still haven’t given up on the continent, here is a handy guide explaining obligations for models released under free / open-source licenses:

https://huggingface.co/blog/yjernite/eu-act-os-guideai

BUSINESS

Don’t trust info unless it’s been officially denied?

Google claims that despite its AI Overviews, click volume to websites has remained stable and the quality of those clicks has actually improved.
Multiple studies / publisher reports show a significant decline in website traffic and a rise in "no-click" searches since the introduction of AI summaries.
Google dismisses these external reports as methodologically flawed and seems to be reframing debate around click quality rather than quantity.

https://blog.google/products/search/ai-search-driving-more-queries-higher-quality-clicks/

CUTTING EDGE

Google DeepMind has introduced a new world model: Genie 3

generates interactive, explorable 3D worlds in real time from user prompts.

Longer interaction times and improved memory
It serves as a powerful virtual sandbox for training and testing next-generation robotics and AI systems by simulating complex, real-world scenarios
Huge potential in gaming, education, and virtual prototyping → for now access is limited to a small group.

https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/

Qwen Image is here:

Apache-2.0 licensed image generation model based on the Multimodal Diffusion Transformer (MMDiT) architecture, uses a single Qwen 2.5 VL text encoder.
SOTA ability to natively render complex and multi-line text directly within images, excelling in both English and Chinese.
High-quality images in various artistic styles + precise editing tasks like style transfer and object manipulation.
Fully supported in the diffusers library → easy access to LoRA fine-tuning, quantization, and caching.

Technical report: https://arxiv.org/abs/2508.02324

Repo: https://github.com/QwenLM/Qwen-Image

HF model page: https://huggingface.co/Qwen/Qwen-Image

Diffusers pipeline: https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwenimage

OpenAI has released GPT-OSS:

Its first ~~open-source~~ open weights model since GPT-2
120B and 20B parameter sizes, Apache 2.0 license.

The models utilize a Mixture-of-Experts (MoE) architecture with a 128K context window
They are designed as "tool-first" models: unless it’s an epic fuckup, they seem engineered to have a high hallucination rate to force reliance on external tools (APIs, web Browse, code execution)
When using tools, the models achieve state-of-the-art performance, with the 120B model scoring 90% on MMLU, rivaling top proprietary models.
Built for agentic workflows and advanced reasoning, the models are accessible through Transformers and Ollama

Announcement: https://openai.com/index/introducing-gpt-oss/

Training and inference recipes: https://github.com/huggingface/gpt-oss-recipes

Anti-hype take:

AI Realist

OpenAI’s Open-Weight Models: Overhyped & Shockingly Underperforming

OpenAI made waves yesterday by releasing their first open-weight model in five years since GPT-2…

12 days ago · 5 likes · Maria Sukhareva

GPT-5 is here:

GPT-5 replaces all previous models with a single, unified system (as in: literally killed all previous models, so if your pipeline was running around GPT-4o or somesuch? Good luck, you’re gonna need it). After a massive uproar, they restored 4o a day later.
It features an automatic "router" that decides whether to use a fast, general model for simple queries or a deeper, more powerful "Thinking" mode for complex reasoning, removing the need for users to choose a model manually. Because OpenAI knows you better than you know yourself.

Performance improvements are, to put it mildly, of the evolutionary kind: slight edge over Gemini 2.5 Pro or Claude Opus in key benchmarks
The model boasts a context window up to 400k tokens, native multimodality (text, image, audio, video) and SOTA coding abilities.
OpenAI has allowed GPT-5 access to free-tier users (with usage caps)

Documentation: https://platform.openai.com/docs/models/gpt-5

Anti-hype take:

AI Realist

GPT-5 Disappoints

Finally, we got what we were waiting for. GPT-5! It took a long time. Some people expected near-AGI performance, and some just wanted to see a big capability leap. Unfortunately, GPT-5 disappoints, just like the open-weight model they released before…

10 days ago · 34 likes · 14 comments · Maria Sukhareva

ElevenLabs has introduced Eleven Music:

The tool that expands their audio expertise from voice synthesis to music generation, creating studio-quality tracks from simple text prompts.

The platform offers users precise editing capabilities, allowing them to craft songs section-by-section and control elements like mood, lyrics, duration, and structure.
Eleven Music is designed to produce tracks ready for immediate commercial use in media like film, TV, and games, aiming to solve licensing issues and reduce production overhead for creators.

Check it out: https://elevenlabs.io/music

I did. IT’S AWESOME: https://elevenlabs.io/music/songs/uddHZHW9R6HSgeVrcaSF

FRINGE

Anthropic CEO Dario Amodei is having a normal one - and by normal one, I mean he is off his rocker (again / still - ymmv). As evident from this interview, he is clearly scared of the open weights models, he has no idea what to do - and he uses A LOT OF WORDS to say it.

Remember Google Glasses? The gizmo abomination died, in part because its users were - quite correctly - mocked into oblivion. Sadly, much like socialism and STDs, creepy ideas have a way of coming back. The most recent iteration is courtesy of Mark Zuckerberg, who dropped the 2.0 persona (when he looked / acted human), reverted to the creepy lizardman persona and made an announcement: anybody not wearing Meta glasses in the future will be at a cognitive disadvantage.

https://fortune.com/2025/07/31/mark-zuckerberg-meta-ray-ban-smart-glasses-ai/

RESEARCH

For many agentic AI applications, small language models (SLMs) are often a better choice than their larger counterparts. SLM are fit for purpose: they offer sufficient capability for specialized tasks with greater efficiency and cost-effectiveness.

Paper: https://arxiv.org/abs/2506.02153

This survey synthesizes retrieval and reasoning approaches, addressing the trade-off where RAG provides facts but lacks complex inference, while reasoning-oriented models often hallucinate. It examines how these two methods can mutually enhance one another. Paper: https://arxiv.org/abs/2507.09477

This paper proposes a novel denoising process to address the poor spatio-temporal consistency in existing 4D diffusion models (video = 3D + time). The method alternately denoises a latent grid along spatial and temporal dimensions.

Paper: https://arxiv.org/abs/2507.13344

ByteDance's Seed-Prover solves math problems by generating formal, computer-verifiable proofs in the Lean 4 language, mimicking the approach of human mathematicians. It became the first model to formally solve five out of six problems from the IMO 2025 competition.

Paper: https://arxiv.org/abs/2507.23726

United States of Banan

Discussion about this post