February 2026 - JOSEPH E SHAW

Google released a new AI model, which shouldn’t surprise anyone. All the major tech companies are fighting for supremacy the way Bryan Cranston fought for name recognition in the television series, Breaking Bad. but Google went in a different direction.

What was surprising was that, according to most of the benchmarks, Gemini 3.1 Pro is likely the smartest model out there. The headlines were all the same flavor of caffeinated awe. Gemini 3.1 Pro scored 77.1% on ARC-AGI-2, a test that attempts to measure whether a model can solve problems it hasn’t just memorized.

I love the phrasing of ARC-AGI-2’s pitch: “Humans can solve every task.” It’s the kind of thing you read and think, Good for us, and then remember that humans can’t agree on whether pineapple belongs on pizza, or whether a zipper should go up or down, or why the sock you lost is always the left one.

But fine. Let’s accept the premise. Here’s the thing that made my stomach do that little elevator drop it usually reserves for airline turbulence and family group texts: Gemini 3.1 Pro didn’t just improve. It jumped, more than doubling its predecessor on ARC-AGI-2 in about 90 days, which is either progress or a sign that we’ve unknowingly put our civilization on fast-forward.

And then comes the part that feels like the plot twist in a thriller where the villain turns out to be your dentist: Google priced it low. Almost insultingly low. They released a “smartest” model and then behaved like a person who brings an elaborate homemade pie to a party and says, “Oh, don’t eat it if you don’t want to. I made it mostly to prove that I could.”

This is the part that matters more than the score. Google doesn’t appear to need my usage the way other AI companies do. They’re not begging me to bring my messy human life, with all of my emails, my contracts, my panicked Friday spreadsheets, into their model’s warm embrace. They’re offering it the way a billionaire offers a handshake: it’s polite and it’s firm, but they’ll forget your name before your palm stops sweating.

Compared to the rest of the AI competitors, Google is playing a different game.

OpenAI, Anthropic, and the rest of the major players feel like they’re living inside the “product race” story. Market share. Daily active users. What features can be bundled, monetized, advertised, enterprise-ified. In that story, the model is the business. Google, meanwhile, has a business that throws off cash the way a lawn sprinkler throws off water. The model can be something else: a research vehicle, a proving ground, a stake in the dirt that says, “We’re building the thing underneath the thing.”

Demis Hassabis has been saying some version of “solve intelligence, then solve everything else” for years. On a recent appearance on 60 Minutes, he talks about AI and disease in a way that feels less like marketing and more like someone describing the weather that’s coming whether you believe in umbrellas or not. This is a man selling you the future, and the unsettling part is that he doesn’t need you to buy it.

Google can afford to have that posture, because they’ve built a vertical stack that looks less like a company and more like a fortress with a moat, a drawbridge, and an internal ecosystem of very serious people who use words like “inference” the way normal people use words like “lunch.” They design their own chips (aka TPUs) like Ironwood, which can scale into pods of 9,216 chips, which is the kind of number that makes you realize your own brain is basically two tablespoons of tapioca trying its best. And then, because the universe has a sense of humor, competitors sometimes train on Google’s hardware anyway, like paying rent to the person you’re competing with in a footrace.

So yes. Google can ship “the smartest model” and act indifferent about whether you use it, because Google’s business isn’t “winning your daily workflow.” Google’s business is being Google.

Most people are mistaken when evaluating the various models available to them. They look at “smart” as a single metric against which everyone is judged. But intelligence doesn’t work that way. I have friends who struggle to find the power button on a laptop and who believe John Steinbeck’s middle name should be a offensive gerund that starts with the letter F, but many of those folks can fix complex car engines with a hammer and a sweat rag, and others can paint the Mona Lisa better than Da Vinci himself. Which of us is smarter?

“Smart” isn’t just one thing, so evaluating whether a specific model is “smarter” than another is technically a nonsense question. A better question is to ask which model is smart in the way you need it to be. Gemini 3.1 Pro is framed as the strongest naked reasoner, which sounds like the title of a Gustave Courbet painting, but is really a way of saying it expends effort thinking deeply about novel problems.

But when you add tools like web search, code execution, reading files, calling APIs, the “equipped reasoners” can pull ahead, because the bottleneck becomes less about how cleverly you think and more about whether you can act on that thinking over time without wandering off to sniff the digital equivalent of a squirrel. Anthropic’s Opus 4.6, for example, got showcased building a C compiler with agent teams: 16 parallel Claudes, like a committee that actually produces something other than resentment. In the parlance of Artistry, Gemini refined the artistic vision, Anthropic organized the studio, and OpenAI perfected a masterful brushstroke.

Similar to “smart,” solving “hard” problems also comprises more than just a single metric. There are reasoning problems: the multi-step, logic-heavy puzzles that make you feel like Sherlock Holmes, except you’re wearing sweatpants and your dog is licking the carpet. ARC-AGI-2 exists to measure that kind of novelty reasoning. Then, there are effort problems: not intellectually hard, just enormous. Like reading process logs of customer interactions until your eyes begin to weep the way statues weep in Catholic churches. This is where agentic systems shine: the models that can keep going, hour after hour, without needing to “feel inspired.”

Next, there are coordination problems: getting multiple teams aligned on a single project, routing dependencies, managing information so nobody builds the wrong thing for for a month because they missed a meeting that was moved because someone else missed a meeting because someone’s dog died. Coordination is the primary industry in corporate America, and our main export is calendar invites.

There are also emotional intelligence problems. These include giving feedback to a good-willed colleague who’s falling behind, negotiating with someone says they want to help but is really trying to get information out of you, reading a room where silence could mean “I hate this” or “I absolutely love this!” and more. If AI ever solves this one, it will not be because it got better at benchmarks. It will be because it learned to notice the way a person says, “Sounds good,” when it does not, in fact, sound good.

There are judgment and willpower problems which could manifest as killing a project, saying no to a client, or making the politically dangerous call because it’s right. AI can provide the answer. It cannot provide the nerve.

The most overlooked problem are domain expertise problems. These are the problems where a veteran might recognize the smell of a recurring incident from 2019. The lawyer who knows which clause gets litigated because they’ve watched it happen. This is not reasoning so much as scar tissue.

And, finally, there’s the one that makes everything else feel like a decoy. Ambiguity problems, which means figuring out what the question even is. The client says they want better reporting, but what they really want is their boss to stop interrogating them. The stakeholder says “efficiency,” but what they mean is “control.” The request says “simple,” but what it means is “politically survivable.”

This is where “smartest model” becomes a weird thing to brag about, because the real bottleneck in most work isn’t “I need to think harder.” It’s “I need to get through all of this,” or “I need to get everyone aligned,” or “I need to figure out what we’re doing here.”

So what do we do?

“Deep Think” recently collaborated with with researchers to tackle professional research problems like math, physics, and computer science, and the examples are the kind of thing you read and feel proud of humanity, until you remember the “humanity” part may be mostly ceremonial going forward.

Isomorphic Labs, DeepMind’s drug discovery sibling, published about a drug design engine (IsoDDE) that claims dramatic performance improvements over AlphaFold 3 on protein-ligand prediction and binding affinity, which is the part where you realize “solve intelligence” is not a slogan. It’s a pipeline.

And somewhere in there, along side the Nobel Prize press release, and the TPU pods, and the benchmarks that sound like dystopian final exams, we are left with some awkward realizations.

The question isn’t “Which AI should I use?” It never has been. The question is: What kind of problem am I solving right now? Which is what it should have been all along.

Because if it’s a pure reasoning problem, Google is selling you the cheapest, strongest engine in town. If it’s an effort or coordination problem, you might want a model that’s built to keep working, to use tools, to persist. And if it’s an emotional intelligence problem, a judgment problem, or an ambiguity problem: congratulations. You are still employed by reality.

Google shipped the smartest model and doesn’t care if I use it, because Google is trying to win something bigger than my workflow. I am trying to win back my afternoon.

My Mom used to keep a small porcelain dish on the coffee table shaped like a swan. It held pastel mints and, more importantly, judgment. If you said something like “that’s stupid,” she wouldn’t yell or lecture. She would simply look at you over the rim of her glasses as if you had just licked the swan.

I think about that swan often now that I am a grown man. I think it of most often when my language occasionally lapses into what, in the Aristotelian sense, might be referred to as “blue” or “off-color.”

It’s amazing how one little word can send people reaching for their emotional pearls. You would have thought I’d set fire to a puppy. I had not. I had merely suggested that a plan involving fourteen manual Excel exports, three cron jobs duct-taped together with hope, and a PowerPoint labeled “Final_v27” might not represent the pinnacle of human thought.

“That’s moronic,” I said, to the audible gasps of many.

In another instance, I mentioned that most people writing about AI on LinkedIn are idiots. This was apparently too much. The platform, I was gently reminded, is a professional space. A space for thought leadership. A space where men in vests explain, with serene confidence, that they have unlocked “10X value” by asking ChatGPT to summarize an article they didn’t read.

“Idiot” was considered harsh.

And then there was the time I said I was so excited about a project that I wanted to strip naked and dance down the street. I did not, to be clear, remove any clothing. I was speaking metaphorically. But metaphor, I have learned, is dangerous territory. Someone somewhere imagined me twirling past a Starbucks and felt unsafe.

The feedback came in waves. Some of it kind. Some of it less so. A few messages suggested I might benefit from “more professional tone alignment.” One recommended I “leverage emotionally neutral language constructs.”

Emotionally neutral language constructs. I picture them as beige cubes. You can stack them in any order and they will never offend anyone, never surprise anyone, and never make anyone feel the sudden electric jolt of recognition that says, Yes. That. Exactly that. This is where AI enters, smoothing everything like a hotel iron pressed against the wrinkled shirt of human expression.

We now have machines that can turn “This plan is a flaming pile of trash on a barge drifting toward the waterfall of budget overruns” into “This proposal may benefit from additional risk mitigation analysis.” Both sentences are technically correct.

I understand the desire for civility. I do. I am not advocating that we wander into meetings and start hurling gerunds like hand grenades. There is a difference between being vivid and being cruel. “Moronic” may not have been my finest hour. It landed harder than I intended. Words do that. They leave the mouth with a jaunty wave and arrive at the other end wearing steel-toed boots. But I worry that in our rush to optimize for safety, we have begun to optimize away humanity.

We are increasingly fluent in what I call Airport English. It is the language of delay announcements and corporate apologies. It is perfectly calibrated to offend no one and inspire even fewer. It contains no sweat, awkward laughter, or confession. It is the linguistic equivalent of a carpet patterned specifically to hide stains.

AI is spectacular at Airport English. It has digested the entire internet and learned that the safest sentence is the one least likely to provoke. It can write a LinkedIn post that sounds like a leadership retreat catered by hummus. It can gently reposition your rage into “constructive curiosity.” It can transform “this is idiotic” into “this approach may not align with strategic objectives.” What it cannot do, at least not without borrowing from us, is bleed.

When I say that some AI commentary feels idiotic, I’m not claiming intellectual superiority. I am reacting to something that feels hollow. There is a sameness to it. The phrasing is polished, the cadence agreeable. The thought is often a warmed-over cliché wearing a blazer and pressed khakis from the Amazon basics collection. We are becoming curators of sanitized enthusiasm.

I’ve even caught myself doing it. I’ll write something sharp and funny and a little dangerous. Then, I’ll run it through an internal filter. Maybe even an external one. The edges soften. The verbs become responsible. The whole thing sits there like a well-behaved golden retriever. And yet, the moments I remember most in conversation aren’t the beige ones. They’re the moments when someone says, “That idea terrifies me,” instead of “I have concerns.” When someone says, “I am so excited I could scream,” instead of “I am cautiously optimistic.” When someone admits, “I was wrong. Spectacularly, embarrassingly wrong.”

“Stupidly, Idiotically, moronically wrong.”

Human speech is messy because humans are messy. We are not probability distributions seeking maximum likability. We are a bundle of nerves and hopes and ridiculous metaphors about dancing naked in the street.

How do we bring back our humanity without simply becoming jerks?

First, we can learn the difference between heat and light. Heat is calling a person an idiot. Light is saying, “This argument collapses under its own weight.” One scorches; the other illuminates. Both are honest. Only one is gratuitous. Next, we need to own our exaggerations. If I say I want to dance naked in the street, perhaps I scan to room to see if anyone’s fingers being searching for pearls and seek to allay their fears.

We can resist outsourcing our emotions to machines. If you’re angry, figure out why before you ask an algorithm to launder the feeling. If you’re joyful, say so in your own crooked, unoptimized words.

Finally, we can extend a little grace in both directions. To the pearl-clutchers, who may simply prefer their coffee without a side of linguistic cayenne. And to the spice-throwers, who are often just trying to feel alive in a world that often sounds like a Terms and Conditions agreement.

Our goals is never to become outrageous for sport. It is to remain unmistakably human, to risk saying something with color, to occasionally overshoot and apologize, and to laugh at ourselves along the way. Communication is more than just the transfer of information. It’s the transfer of feeling and perspective as well..

The porcelain swan is still there in my mind, watching. I suspect it prefers that I retire “moronic.” Fair enough. But I also suspect it would be bored to death in a world where every sentence is professionally moisturized and emotionally gluten-free.

Somewhere between the flaming trash barge and the risk mitigation analysis lies a voice that is honest, vivid, and kind. I am trying to find it. Fully clothed, of course.

Most of the time.

Month: February 2026

Google Gemini 3.1 Pro is like your Cat. It Knows How Awesome it is, and it Doesn’t Care What You Think.

Lick the Porcelain Swan