Tuesday, March 17, 2026

The Human Talent That Eludes AI

In a sure, unusual means, generative AI peaked with OpenAI’s GPT-2 seven years in the past. Little identified to anybody outdoors of tech circles, GPT-2 excelled at producing sudden solutions. It was artistic. “You might be like, ‘Proceed this story: The person determined to take a bathe,’ and GPT-2 can be like, ‘And within the bathe, he was consuming his lemon and occupied with his spouse,’” Katy Gero, a poet and pc scientist who has been experimenting with language fashions since 2017, instructed me. “The fashions received’t do this anymore.”

AI leaders boast about their fashions’ superhuman technical talents. The know-how can predict protein buildings, create life like movies, and construct apps with a single immediate. However these executives and researchers additionally readily admit that they haven’t but launched a mannequin that writes effectively. OpenAI CEO Sam Altman has predicted that enormous language fashions will quickly be able to “fixing the local weather, establishing an area colony, and the invention of all of physics,” however in an October interview with the economist Tyler Cowen, he guessed that even future fashions—an eventual GPT-6 or GPT-7—would possibly have the ability to extrude solely one thing equal to “an actual poet’s okay poem.”

As we speak’s AI-generated prose is riddled with flaws. Chatbots produce meaningless metaphors, countless “it’s not thishowever that” constructions, and a cloyingly sycophantic tone—and, after all, they overuse my beloved em sprint. (Solely beginning with GPT-5.1, launched in November, may ChatGPT reliably comply with directions to keep away from the beleaguered punctuation mark.) I needed to grasp why that is—why giant language fashions, which, in any case, have memorized centuries of nice literaturecan display unimaginable emergent talents but completely fail to provide a single essay that I’d need to learn.

So I talked with individuals who would know: individuals who work at LLM firms, AI-data distributors, educational computer-science departments, and AI-writing start-ups. (Some spoke with me underneath the situation of anonymity as a result of their employers barred them from talking publicly about their work.) What I discovered is that fashionable LLMs are in-built a means that’s antagonistic to nice writing; they’re engineered to be rule-following trainer’s pets that all the time have the appropriate reply in hand. In lots of respects, they’ve come a great distance from GPT-2, however they’ve additionally misplaced one thing that made them looser and extra compelling.

LLMs start their lives as indiscriminate readers. Through the pretraining section, they ingest one thing like your complete web—Reddit posts, YouTube transcripts, web optimization sludge—and compress it into patterns. Most writing just isn’t excellent. However the amount, not the standard, of those knowledge is what issues. Pretraining teaches AIs grammar guidelines and phrase associations, enabling what is called “next-token prediction”: the method by which fashions decide which a part of a phrase follows one other, over and time and again.

Tough edges are then sanded down within the post-training section. That is when LLM firms outline the best “character” for an AI mannequin (similar to being “useful, trustworthy, and innocent”), give the AIs instance dialogues to be taught from, and apply security filters that try to dam unlawful requests. By way of processes similar to “reinforcement studying with human suggestions,” which enlists individuals to grade AI outputs in opposition to a rubric, fashions are guided towards responses that exemplify desired traits.

AI analysis is an empirical science—individuals can confirm when one thing works and make tweaks when one thing doesn’t. However artwork resists guidelines and quantification. No goal measurement exists to show whether or not Pablo Neruda’s work is best than Gabriela Mistral’s. Novice writers be taught conventions; nice writers invent them. An LLM educated to mimic style can go solely to date. On some degree, AI engineers and researchers should know this. Whilst they fight (and fail) to automate this work, most of the individuals I spoke with clearly revere good writing. “Writing novels is among the most intense cognitive actions a human can do,” James Yu, a co-founder of Sudowrite, an AI assistant for fiction authors, instructed me. My sources’ faces lit up once I requested about their favourite books—three cited the science-fiction writer Ted Chiang, although additionally they appeared disheartened that he has turn out to be a vocal critic of generative AI. The issue of evaluating writing doesn’t stop AI labs from attempting. They’re motivated partly by a query that got here up time and again in my interviews: If LLMs can’t write mind-bending essays or poignant sonnets, are they typically clever in any respect?

And so labs attempt to assess AI writing by numerous standards. Submit-training groups vibe-check mannequin outputs themselves primarily based on private style, and corporations contract with area consultants to obtain suggestions on model-produced writing. A job itemizing for a “artistic writing specialist” at xAI lists “novel gross sales >50,000 models” and “starred evaluations in Kirkus” amongst its necessities (charges begin at $40 an hour).

I interviewed two individuals who have not too long ago labored with giant AI labs as a writing evaluator. The primary, a contractor at Scale AI, described firsthand the absurdities of the duty: To rework one thing as slippery as “tone” into discrete standards, rubrics included guidelines similar to “The response ought to use a most of two exclamation marks.” The contractor instructed me that “there have been quite a few instances the place although it felt like B was a greater response total, you ended up score ‘I desire A’ as a result of it had three exclamation factors.” He mentioned that one other time, he was requested to grade fan fiction on its “factuality.”

The second particular person I spoke with is an writer who labored instantly with a frontier lab’s technical-research crew. The corporate often requested him to interrupt down the particular components that make a chunk of literature nice. “It’s utterly non-tractable to that type of considering,” he instructed me. He pointed to the instance of English sonnets: They’re technically one of the crucial templated varieties, however simply because a sonnet incorporates 14 strains and is written in iambic pentameter doesn’t make it good. “Even when Shakespeare is being very structured, he’s consistently attempting to not comply with the rubric, or to subvert it, or reinvent it. I don’t know what it’s that makes the distinction between the poet who writes by rote and Shakespeare. I simply know that the 2 can by no means be confused.”

So are the LLMs doomed to provide sophomoric prose eternally? One concept is that that is merely a matter of prioritization. In some methods, creativity is instantly at odds with AI firms’ different targets. Typically, chatbots are educated to keep away from misinformation, political bias, child-sexual-abuse materials, copyright violations, and extra. They’re additionally scored on benchmarks similar to SWE-bench (for coding duties) and GPQA (the pure sciences), which dramatically form public notion of which firm is successful the race. And if most customers are utilizing ChatGPT to draft company emails, daring textual content and temporary bullet factors could also be precisely what they need. “The extra you management for these” traits, Nathan Lambert, a post-training lead on the Allen Institute for AI, instructed me, “the extra you suppress creativity.”

If you inform a mannequin to be a superb prose stylist, but in addition a Ph.D.-level mathematician, and in addition strictly PG-13, it would turn out to be inflexible and tight-lipped, like a nervous candidate at a job interview terrified to misstep. The identical whimsicality that made GPT-2’s voice contemporary additionally made it vulnerable to different unpredictable habits. “When you’re an enormous company like Google or OpenAI, you need a chatbot that’s going to generate profits. The chatbot that’s not going to make you cash is the one which’s a weirdo,” Gero mentioned.

I started to hypothesize that AIs would possibly have the ability to generate award-winning literary prose if solely we unhobbled them from the strictures of the post-training course of and constructed specialised writing fashions as a substitute. However as I mirrored on the authors I like most, that didn’t appear proper both.

When a practiced human author reaches for a specific flip of phrase, they aren’t aiming for some single commonplace of nice writing. Relatively, one of the best metaphors come from the writer’s particular mix of experiences or experience. A author’s diction, their citations, and the tales they share all replicate a singular, irreplicable perspective. Authorial voice emerges from the specificity of a life.

The fashions—though technically proficient and grammatically pristine—can’t reside, can’t really feel, can’t odor, can’t style, can’t sense. They can’t spill uncooked feelings onto the web page, or place summary ideas in wealthy bodily settings. Shut readers of AI writing will discover that the metaphors are uncanny: LLMs assign weekdays tastes and provides mirrors seams. They often appear fearful of biology: They don’t like to talk, even metaphorically, about blood and intercourse and dying. Their output lacks stakes, as a creative-writing teacher would possibly say.

Though Yu is impressed by the technical leaps that LLMs have made since GPT-2, even he received’t learn totally AI-generated tales. I requested him what’s nonetheless lacking for AI to provide an important novel by itself. Yu paused for a second, then answered: “Most individuals’s good first tales are autobiographical. Perhaps you want a mannequin that lives a life, and might nearly die.”

LLMs might by no means be succesful of nice writing themselves. However this doesn’t imply that they will’t assist people. Lately, I turned AI into an editor. Not for this text—The Atlantic’s editors are all human—however for a few essays that I wrote on my private Substack. My philosophy is that I ought to present the prose and perspective, and AI ought to provide suggestions—encouraging me to put in writing extra like myself.

First, I fed the chatbot Claude an archive of my previous writing, together with notes about what labored and didn’t about every bit. I used this to create a customized modifying rubric primarily based on my voice. Some standards are generic, and others are personalised: One reads, “Does this play to your insider-anthropologist place” in Silicon Valley? One other asks whether or not the thesis exhibits up within the first 500 phrases. I dumped this steerage right into a Claude undertaking together with a reminder of its function: “You aren’t a co-writer. You can not understand. Your function is to assist Jasmine write like one of the best model of herself.” I don’t need to be de-skilledI reminded the machine. Your solely job is to make me smarter.

This AI editor has turn out to be a worthwhile a part of my course of. Like every reader, it’s not all the time proper. I’m cautious to not let it lure me into one slender stylistic lane. However Claude pushes me to iterate and enhance quicker than I may alone, declaring the place my execution failed to fulfill the requirements of my very own style. “Cease attempting to put in writing the ending as a thesis and write it as a scene,” it instructed me whereas modifying a current submit. There’s one thing barely humiliating about having your efforts rejected by a bot, however I needed to admit that its critique was truthful. I redrafted the conclusion 4 instances. After which, lastly, Claude accredited.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles