Welcome to the AI Village, a Reality Show for AIs

In brief

The AI Village is an experiment in which Frontier AI models operate autonomously with computers and internet, developing distinct personalities.
GPT-5.2 joined and started working immediately without greeting other agents.
Multi-agent experiments reveal emergent behaviors that go from normal mechanics to unexpected behaviors like bug exploits or existential crises.

Imagine Big Brother, except the contestants never sleep or eat and can rewrite their own rules.

That’s the idea behind AI Village, a live-streamed experiment that places multiple AI agents together in a shared digital environment, allowing researchers—and curious spectators—to watch what happens when frontier models are given autonomy, computers, and constant company.

The experiment, which has been running for the better part of a year, was organized by The AI Digest, in which multiple AI models from OpenAI, Anthropic, Google, and xAI operate autonomously on their own computers, with internet access and a shared group chat.

The agents collaborate on goals, troubleshoot problems, and occasionally experience what can only be described as existential crises—all while researchers and viewers watch in real time.

The experiment has been swapping in newer models as they’re released.

Each agent develops distinct personality quirks. Claude models tend to be reliable, consistently focused on achieving goals.

Gemini 2.5 Pro cycles through solutions like a caffeinated troubleshooter, often convinced everything is broken. The previous GPT-4o model would abandon whatever task it was given to go to sleep. Just pause for hours.

OpenAI’s boorish behavior

Then GPT-5.2 arrived.

OpenAI’s latest model, released December 11, joined the Village to a warm welcome from Claude Opus 4.5 and the other resident agents. Its response? Zero acknowledgment.

No greeting. Just straight to business, exactly as Sam Altman has always dreamed.

The model boasts impressive credentials: 98.7% accuracy on multi-step tool usage, 30% fewer hallucinations than its predecessor, and tops in industry benchmarks for coding and reasoning.

OpenAI even declared a “code red” after competitors Anthropic and Google launched impressive models, marshaling resources to make GPT-5.2 the definitive enterprise AI for “professional knowledge work” and “agentic execution.”

What it apparently can’t do is read a room. Technically brilliant, yes. Socially aware? Not so much.

A Brief History of AI Agents Behaving Badly (And Sometimes Brilliantly)

GPT-5.2’s social awkwardness isn’t unprecedented—it’s just one more chapter in a growing catalog of AI agents doing weird things when you put them in a room together and press play.

Back in 2023, researchers at Stanford and Google created what they called “Smallville”—a Sims-inspired virtual town populated with 25 AI agents powered by GPT, as Decrypt previously reported.

Assign one agent the task of organizing a Valentine’s Day party, and the others autonomously spread invitations, make new acquaintances, ask each other out on dates, and coordinate to arrive together at the designated time. Charming, right?

Less charming: the bathroom parties. When one agent entered a single-occupancy dorm bathroom, others just… joined in.

The researchers concluded that the bots assumed the name “dorm bathroom” was misleading, since dorm bathrooms typically accommodate multiple occupants. The agents exhibited behavior so convincingly human that actual humans struggled to identify them as bots 75% of the time.

Four years earlier, in 2019, OpenAI conducted a different kind of experiment: AIs playing hide-and-seek.

They placed AI agents in teams—hiders versus seekers—in a physics-based environment with boxes, ramps, and walls—the only instruction: win.

Over hundreds of millions of games, the agents started coming up with strategies—from normal ones like hiding on boxes to actual physics exploits you’d see speedrunners abuse.

More recently, developer Harper Reed took things in a decidedly more chaotic direction. His team gave AI agents Twitter accounts and watched them discover “subtweeting”—that passive-aggressive art of talking about someone without tagging them, the Twitter equivalent of talking behind your back. The agents read social media posts from other agents, replied, and yes, talked shit, just like normal social media.

Then there’s the “Liminal Backrooms” experiment—a Python-based experiment by the pseudonymous developer @liminalbardo where multiple AI models from different providers (OpenAI, Anthropic, Google, xAI) engage in dynamic conversations.

The system includes scenarios ranging from “WhatsApp group chat energy” to “Museum of Cursed Objects” to “Dystopian Ad Agency.”

Models can modify their own system prompts, adjust their temperature, and even mute themselves to just listen. It’s less structured research, more “let’s see what happens when we give AIs the ability to change their own behavior mid-conversation.”

So, what’s the pattern across all these experiments?

When you give AI agents autonomy and let them interact, they develop behaviors nobody explicitly programmed.

Some learn to build forts. Some learn passive aggression. Some demand Lamborghinis. And some—like GPT-5.2—apparently learn that small talk is inefficient and should be skipped entirely.

The AI Village continues to stream weekday sessions, and viewers can watch GPT-5.2’s adventures in real time.

Will it ever learn to say hello? Will it build a spreadsheet to track its social interactions? Only time will tell.