This week in tech: 11.05.2026
Summary of AI developments - made for busy people
APPLICATIONS
Robots (1):
Japan Airlines will run humanoid robots at Tokyo Airport. The program will start in May: machines will assist ground crews in moving baggage and cargo, because - you guessed it - labor shortages.
https://interestingengineering.com/ai-robotics/japan-humanoid-robots-haneda-airport
Robots (2):
Figure AI is becoming quite a robotic powerhouse. They have ramped up production from one humanoid robot per day to one per hour in under 120 days - Figure 03 variant is manufactured across 150 workstations at their California factory. When the company first mentioned a goal of 50k robots annually, everybody laughed.
Far fewer do so now.
https://www.figure.ai/news/ramping-figure-03-production
Robots (3):
NVIDIA has introduced Cosmos Reason 2:
32B VLM tailored for physical AI and robotics - focus on spatial-temporal understanding and real-world reasoning
Post-trained to grasp physics and embodied decision-making - the idea is to enable robots and autonomous systems to interpret environments, plan actions, and operate in unfamiliar settings
Based on Qwen3-VL-32B-Instruct, supports up to 256K tokens, and produces detailed reasoning for visual inputs.
NVIDIA Open Model License for commercial and derivative use.
HF model page: https://huggingface.co/nvidia/Cosmos-Reason2-32B
HF dataset: https://huggingface.co/collections/nvidia/cosmos-reason2
Paper: https://arxiv.org/abs/2503.06800
Video generation typically uses separate models for tasks like matting, relighting, or intrinsic decomposition - while computationally efficient, it ignores the links between visual modalities. UniVidX unifies this by using a single multimodal model that supports 30 video tasks with omni-directional conditioning (= any modality can act as input or output). The design combines stochastic condition masking, decoupled gated LoRA, and cross-modal self-attention - and to put a cherry atop a cake, it was trained on a mere 1k videos.
Paper: https://arxiv.org/abs/2605.00658
HF model page: https://huggingface.co/houyuanchen/UniVidX
Project page: https://houyuanchen111.github.io/UniVidX.github.io/
Repo: https://github.com/houyuanchen111/UniVidX
BUSINESS
Money talks, bs walks: ethical AI is great, but it does not pay the bills - whereas, as Anthropic example has showed, assisting in the liberation of Iran very much does.* Pentagon has “partnered” with eight major tech firms (Google, Nvidia, OpenAI, SpaceX, Microsoft, Amazon Web Services, Reflection AI, and Oracle) to deploy frontier AI systems for warfighting, intelligence, and enterprise use. The initiative reflects a broader push of the administration of the Ginger Caligula toward an “AI-first War Department”. For now, Anthropic is excluded amid a contract dispute (although Claude very much remains part of the solution provided by Palantir).
*For the cognitively diverse: this is sarcasm.
https://techobserver.in/news/egov/google-nvidia-openai-pentagon-ai-classified-networks-323892/
Robots (4):
First came the glorious success of the Metaverse initiative, followed by leadership in GenAI space (replacing Yann LeCun with Alexander Wang) - and now Meta wants to show it can into robotics. They have acquired Assured Robot Intelligence (ARI), a startup developing foundation models for humanoid robots that can perform real-world physical tasks.
Six years ago the increasingly-less-United Kingdom left the EU but judging by their approach to technological sovereignty? They might as well have stayed. Over 50M GBP from the fund set up to boost Britain’s scientific leadership has gone to US tech firms and VCs - with some of the recipients establishing UK entities shortly before receiving funding. That will show those evil American tech companies.
Nick Clegg started out a liberal politician, had a deputy PM episode, ended up as a corporate suit at Meta - but despite the utterly predictable arc of doing a meme, he does have moments of honesty. His take on British importance in GenAI is refreshing in its honesty:
"No one in their right mind would ever train an LLM foundation model in the UK. In fact no one does".
No lies detected - although it has to be noted that Switzerland did precisely that. But maybe, just maybe, the Swiss have feel about their country a bit differently than Mr Clegg does about the not-particularly-United Kingdom.
No matter how much Zuckerberg kowtows to the Chinese govt, he still can’t catch a break. Most recent episode of his humiliation journey involves Manus - remember, the startup behind the “world’s first AI engineer”? Beijing cancelled the USD 2B deal AFTER it had already been closed - shocker of shockers, the justification cited foreign ownership and national security concerns tied to advanced AI assets (which, to be fair, makes perfect sense).
The most ethical AI company in the galaxy has introduced mandatory identity verification - including photo IDs and selfies - for certain Claude users. This can be something of an issue for people in jurisdictions like China, where access to Anthropic has to happen through a VPN - technically illegal, but commonly tolerated.
Google’s new patent US12536233B1) is a method for AI for intercept clicks and serve its own version of brand webpages. This raises all sorts of risks for retailers: diluting brand identity, reduced visibility into customer behavior, weakened data ownership, AND UNCLEAR LIABILITY if AI-generated content is wrong. I make a living with predictions, so let me try one here:
1. Unless US introduces an AI Sherman act, nothing will change and Google and whatnot will continue making Uncle Ted look like a prophet.
2. US in unlikely to weaken their behemoths in any way because China will not.
3. Barring a change of leadership - even less like than point 1 - Europe will pull a bmw (b**ch, moan, and whine) and then do what it's told.
https://meetrise.com/insights/new-google-patent-outlines-an-ai-generated-webpage-swap
I did not see that one coming: European Commission is actually doing its job for once and is moving against Instagram and Facebook for failing to mitigate the risks of children under 13 years old.
FRINGE
There are people who inspire confidence, regular ones, strange ones, creepy - and then there is Sam Altman. Leaving his outside his shenanigans in and around OpenAI, he is also known for investing in ventures like genome editing in embryos. His latest one: he’s working with Tinder to verify that its users are real people. Their method of choice is ofc par for the course: scanning user eyes with an orb-like device. The startup is called Tools for Humanity. I can’t even.
https://futurism.com/future-society/tinder-scanning-eyeballs
RESEARCH
I cannot for the life of me understand why do people insist on using models trained on text for numerical (= not text) problems - but there is a slight possibility it’s a skill issue on my part. At any rate, a new paper introduces a framework for time-series forecasting that enables LLMs to perform numerical reasoning by updating beliefs through linguistic information. It utilizes a sequential Bayesian approach (obviously) to map qualitative insights into quantitative forecast adjustments - if it really bridges the chasm between domain expertise and statistical modeling, this alone will make it worth the cognitive effort.
Paper: https://arxiv.org/abs/2604.18576
If you want to explain why your ML model does what it does, SHAP is an excellent choice - minimal setup and pretty graphs being among its advantages. A tiny issue is that they make certain assumptions which are violated when dealing with time series -and that’s what this new paper addresses. The goal is to provide high-level, concept-based explanations for time series models. C-SHAP allows users to interpret model inference both in terms of global patterns and domain-specific temporal concepts.
Paper: https://arxiv.org/abs/2504.11159
Whatever the modeling question, for some people - post 2017 - the answer is “transformer”. This research investigates why Transformer-based models often yield trivial ("collapsed") forecasts when applied to financial returns with weak conditional structures. The authors argue that in the presence of irreducible noise, increased model expressivity amplifies variance rather without reducing bias - worst of both worlds.
Paper: https://arxiv.org/abs/2604.00064
I am intrigued and terrified at the same time: Meta has just proposed Neural Computers. The idea is that a model is the runtime - with memory, computation, and interfaces all embedded in one learned system. Early prototypes can simulate terminals and GUIs from prompts and achieve strong results, though reliability and control are, to put it diplomatically, an open challenge. The major shift is conceptual: moving beyond agents toward models as the computing substrate itself.
Paper: https://arxiv.org/abs/2604.06425
Vision Banana reframes all vision tasks as image generation within a single model - that includes segmentation, depth, and 3D understanding. The authors achieve that by lightly mixing task-specific data into generative training: the generated RGB outputs can be decoded into precise labels / geometry while still getting quality results at image generation. The result is a unified interface where generative pretraining becomes the backbone for solving (nearly) every visual problem.
Paper: https://arxiv.org/abs/2604.20329

