This is Robin Sloan’s lab notebook. It’s about media and technology, creative computing, AI aesthetics, & more. Here's the RSS feed. My email address: robin@robinsloan.com
Anybody from Anthropic out there reading? Here is a tiny feature request for the cool new Claude Managed Agents: currently the usage field on a session seems only to get updated (with, e.g., current token counts) when the session goes idle. But, I also want to track usage during long, multistep executions … in fact, I might argue that’s MOSTLY when I want to track it, to prevent runaway work.
So, it would be nice if the usage stats updated live, or live-ish.
I am not an LLM superuser — in the sense that I am not locked in all day, marshaling My Dutiful Minions; I have no minions — but I do ask questions from time to time, mostly technical, and I have done so consistently for a couple of years now, so naturally I have noticed changes in the way the models respond.
Lately, Claude seems very eager to match not only my register as a user, but the register of whatever documents it is considering; there is an effect almost of “voice capture”.
I think of this as a subtle but deep sycophancy. Distinct from the superficial sycophancy of you’re right! you’re brilliant!, this flavor might appear to disagree or push back, while still affirming: yes, this is the right way to frame an idea; to have a conversation. (Here’s a brief chat with Claude that prompted this thought.)
The truly unsycophantic model would sometimes respond: lol wut?
Gemini’s tone, by contrast, is colder, more frankly robotic, and to me it seems less malleable. Certainly, it’s very disciplined about refusing to participate in its own anthropomorphization. It’s also “distant”, somehow … Gemini is writing across a vast gulf, whereas Claude wants to be like, sitting next to you on the park bench.
I prefer the gulf, because I think it’s more accurate.
There have been experiments involving language models trained on vintage text before, but they lingered mostly in the realm of the gimmick; Talkie is notable both for its size — the largest such model so far, 13B parameters trained on 260B tokens written before 1930 — and for the depth of the questions its creators are asking.
We know a smart human from the 1930s, yanked a hundred years into the future, could learn to program computers using, e.g., Python without any problem.
Can an LLM yanked a hundred years into the future do this, too?
Talkie reveals that no, it can’t, not really — though the tests in the launch post only scratch the surface of what might be attempted and explored. And of course a bigger Talkie, maybe GPT-3-class, would have different capabilities — if indeed it’s even possible to train one. (One senses the authors here have already rummaged beneath the couch cushions for pre-1930 tokens … )
I’m presently reading a terrific biography of Claude Shannon. In the late 1930s, his MIT master’s thesis — “the most important master’s thesis ever”—established a direct mapping between electric circuits and Boolean logic. This connection was both very simple and totally radical; at the time, Boolean logic wasn’t considered particularly practical — in fact, it wasn’t considered much at all. In a stroke, Shannon’s insight opened up a new field, basically the same one that all this LLM research is unfolding in today.
If you could coax Talkie, or a future version that’s larger and more capable, into making Claude Shannon’s connection — without simply giving it away, of course — it would provide evidence that modern LLMs might be able to make connections of that power at the real frontier of knowledge today.
Conversely, if no amount of coaxing or even coaching could get Big Talkie anywhere near a robust approximation of Shannon’s thesis … it would raise questions about this whole game plan.
I asked Talkie:
Does it seem to you like there might be any correspondence between electric circuits and the logic of George Boole?
Talkie replied:
I think not. The fundamental propositions in logic are independent of all electrical considerations, and they do not admit of any illustrations drawn from electricity.
This isn’t a fair test, of course; Talkie hasn’t yet been trained to run in dogged loops, to roam through vast fields of if/then, but wait, actually … There’s plenty of investigation that remains to be done here.
Demis Hassabis is fond of saying that a test for truly powerful AI would be to train a Talkie-like LLM with a knowledge cutoff of 1911, then challenge it to formulate general relativity, as Einstein did in 1915.
I agree that this would be impressive, but/and I also wonder if it’s too challenging. Science would benefit from Einsteins on demand, sure … but it would also benefit from simpler insights: the kind of “what if X is also Y” mapping that Claude Shannon provided. Those feel to me much more plausibly in the wheelhouse of LLMs than Einstein-level cosmic restructurings. (I feel sort of bad calling Shannon’s century-defining insight “simpler” but … I also sort of think he would agree … )
That’s not to say I find even those simple insights, at this moment, particularly plausible … you read about Shannon and you learn there was more than language in play here. This was a guy deeply enmeshed in the physical world. For him, the circuits weren’t imaginary; they were real, and they were a tangled mess.
Yet it does not seem, in principle, IMPOSSIBLE for some future Talkie to go crawling through circuit diagrams, through crusty neglected Boole, and discover the same simple, incandescent, epochal translation that Shannon did. It’s very interesting to think about.
Anyway, this is all to say, Talkie is a triumph, hugely provocative, potentially very productive. Bravo!
I believe Google’s release of Gemma 4 is a quiet milestone, and it might be more consequential to the overall arc of “how we use LLMs” than the mammoth models now rumbling behind closed doors.
Google has somehow managed to extend Gemini’s visual acuity into these open-weights models. My application has to do with handwriting recognition, plus the calculation of bounding boxes for blobs of text, and the 31B version performs as well as Gemini 3 Flash … and nearly as well as Gemini 3.1 Pro?! (This isn’t just vibes, but quantitative scoring.) Yet Gemma 4 31B is a model I can run however and wherever I want … it runs (quantized) on my old 2017-era deep learning rig with its three 12GB GPUs. It runs in the secure enclaves on Tinfoil.
A big brilliant model is cool, but I do not find it exciting in the way I find Gemma 4.
Just yesterday discovered the company Tinfoil, which is doing cool work with AI models running inside secure enclaves—end-to-end encrypted inference, basically. I found myself getting really excited as I paged through the site, both because the capabilities are interesting and because it’s all so crisp and well-presented. It’s refreshing to encounter a project with such a deep and evident keel — both moral and technical.
Here’s a reason to do good, principled work in the world: it creates resonance in other people. This morning, going back to the site to dig a little deeper, I feel myself ever-so-slightly vibrating.
One of the company’s co-founders has a great-looking blog, too. I found her short essay The Closing of the Frontier totally compelling.
The new language models are children of the reasoning revolution, and they stream out these long, circuitous thinking traces. They are said to be applying more compute to our questions and challenges.
This is subtle, but that “more” isn’t particularly about thinking harder. Rather, it’s about thinking in the right direction. It is not the gas pedal, but the steering wheel — better yet, the GPS map in the dashboard.
The reasoning revolution depends, in part, on the unreasonable effectiveness of specific words: twists like “but wait” and “actually”, which operate as powerfully as magic spells. (The English department NEEDS to get into the game with this stuff.) Is the phrase “but wait” really a white-hot kernel of intellectual effort? No. It’s a sign planted in the ground, pointing THAT-A-WAY, towards a particular kind of document that humans find useful.
(Don’t mistake precision for minimization. I’m not dismissively saying, these are just documents; I am plainly observing, these are documents. If you don’t think documents are cool, even sometimes cosmic, that’s on you!)
Notice that, as in real life, directions aren’t always correct. It’s likely that you have by now watched a language model walk in circles, “but wait”-ing itself back around, and around, and around again …
Recent research from Apple talks about “forks” in the road, with “distractors” that can lead a model in the wrong direction.
Here’s more evidence for the navigation argument: base models can already do the things reasoning models can do … it just takes them much longer to arrive in the correct regions of high-dimensional space. Base models are fine thinkers, but cruddy navigators.
The single forward pass of a language model runs on its own, refracting a context window into an array of probabilities; that’s all “the model” ever does. However, each forward pass can “stand on the shoulders of giants”, taking direction from previous passes, bringing its brief labors into better alignment with the desires of the human operator, way out there.
As usual, observations about language models raise questions about human minds. Do we think harder mostly by thinking in the right direction? I think the answer is sometimes yes — thinking as search — and sometimes no. Maybe I’m wrong, but I believe I can feel different mechanisms at work. And of course human thought is not a document; it unfurls, and compounds, and considers itself, in a richer space.
(This post is related to the latest edition of my pop-up newsletter on AI.)
This isn’t foolproof; the centrifuges at Fordo were isolated, all those years ago, and still, somebody carried a USB stick inside … BOOM! But a thick and sultry airgrap improves any system’s baseline security by about 1000X, and, the thing is, I just don’t believe most things need to be online in the first place. When I say that, I’m talking about both home refrigerators and electricity substations. And I’m definitely talking about my car!! I think a lot of things went online because they could go online, in the “smart” frenzy of the early century.
It’s not that connectivity is without benefits — just that the benefits are so clearly outweighed, in so many cases, by exposure to a nonstop adversarial haze that will soon become even more dangerous.
It’s a small thing, yet it says a lot, that OpenAI’s Industrial Policy for the Intelligence Age is presented only as a PDF that looks terrible, with cruddy justification and a footer image that’s too lo-res and blurry for clear printing.
I’ll note also that there are no human names attached to either the blog post or the PDF.
I think there’s moral value to sweating the details — certainly at this scale — and the apparent absence of any such sweat is disappointing and dispiriting.
A new edition of my pop-up AI newsletter just landed: where is it like to be a language model? The discussion here is bolstered by an actual experiment, a programmatic probe of many language models. It was my first time doing something like that — fun!
The title is, of course, a riff on Thomas Nagel’s famous What Is It Like to Be a Bat? Recently I crossed the bridge into San Francisco — I was thinking about this piece — and just as the big double-decker bus curled into the Transbay Terminal, I spotted this mural. Perfect:
The new edition poses questions that plenty of people, including many deep in the AI industry, don’t care about; but some of us do — we are preoccupied by them — so this is for you, and for me.
Noting this here mostly for myself: this JavaScript graphics engine translates 3D point sets into orthographic SVG renderings — crisp and clean. There are lots of fun options for transformation and styling.
Naturally I’m thinking not about web pages but print applications …