Skip to main content

Think of AI as a Moving Target

Loosely speaking, you can treat the AI as a capable technical designer or client-side programmer who happens to have no instinct for game aesthetics or player experience. On concrete, well-defined programming tasks and on math problems, today’s frontier models already outperform most programmers. But as language models, they have no real grasp of game feel, and they can’t naturally tell which UI layout is better or which interaction feedback feels right to a player. For short-term projects, you can prototype and iterate fast without fully understanding the implementation—what people now call Vibe Coding. For serious long-term projects, developers should review the AI’s technical decisions and understand how things were built. Otherwise, when something breaks, you’ll be reduced to guess-and-check, and once the AI gets stuck, the project’s maintainability and continuity take a direct hit. For larger projects, we recommend that programmers drive framework-level work with the Agent, while non-technical designers ship gameplay features through Skills (or other guardrails) the programmers prepare upfront. Designers should avoid making framework-level engineering requests directly. A request like “I want the player to leave footprints in the snow” sounds modest, but it can send the Agent off to build something like a virtual texturing system that touches the underlying architecture. The base models themselves are also moving fast. The release cadence for frontier models has gone from once a year, to every six months, to roughly every two. The right way to use Agents will keep shifting too. A year ago, limited context length and tool support made today’s typical Agent tasks essentially impossible. The advice on this page reflects where we are right now, and may need updating again within a few months. At this stage, it’s normal for the AI to make mistakes during development. Test in Unity, feed the result back, and many issues will sort themselves out after a few rounds. If a single problem stays stuck for a long time, starting a fresh conversation and letting a clean Agent diagnose the existing implementation tends to work better. We’ll publish two technical posts later that lay this out more systematically:
  • Harness Engineering for Game Projects
  • Building Dev Agent for Game Engines

Specific and Correct > Vague > Specific but Wrong

Give the AI as much accurate context as you can, so it understands what you’re actually after. For example:
  • Implementing a feature: explain what it’s for and the design details you have in mind.
  • Implementing a gameplay mechanic: describe the experience you want, similar games to reference, and the key design details.
  • Modifying an existing feature: explain its current state, why it needs to change, and where you’re thinking of taking it.
Designers, artists, and anyone unsure about the technical implementation should avoid prescribing an overly specific technical route. When the technical premise is wrong, an explicit implementation path will often steer the AI in the wrong direction and keep it digging in—deeper.

For Complex Tasks, Plan and Review Before Execution

A lot of game mechanics that look simple on the surface are actually tricky to implement. Designers and gameplay programmers know this all too well. Take “jump” in a side-scrolling platformer. Behind it sit decisions like: do we model character actions with a state machine or a state tree? Do we include coyote time? How do we tune takeoff and landing curves to shape the speed profile? Traditionally, gameplay programmers spent a lot of time talking with designers, surfacing the unwritten requirements and turning them into code. By default, an AI is more inclined to satisfy the surface request as quickly as possible. If you describe a feature in one short sentence, the AI can’t see the project’s design goals or its long-term direction, and the result is likely to drift. For complex requirements, ask the AI not to touch any files yet—have it produce an implementation plan first. Talk through requirements, edge cases, and key design decisions, and only then move into the actual coding phase. The AI doesn’t truly understand game feel, but it knows a lot of the typical patterns from existing games. For things with mature recipes, you can also ask it directly “how can I improve the feel?”. A developer who hasn’t shipped a platformer might never think of coyote time on their own; the AI usually will. The flip side: in the past, designers had to sell features hard to programmers because development was expensive. Implementation and iteration are dramatically cheaper now, so designers can validate ideas quickly at low cost. They no longer need to polish a complete pitch and convince programmers to spend resources before trying something out—prototyping and gameplay iteration can start much sooner. The Locus feature that summarizes design docs from chat was built around this idea.

Maintain the Knowledge Base Regularly

We gave Locus the ability to maintain its knowledge base automatically because we want it to keep building a deeper understanding of your project as it talks with you and works on the code. Within that, the Design section deserves special attention. It’s the AI’s summary of your requirements, and it’ll treat that content as ground truth in later tasks. Keeping it accurate is on you. Memory holds experiential notes the AI built from the project and its surrounding context. Review it periodically, and edit or delete entries when they’re off. We’ll ship a dedicated Skill later that helps the AI maintain the knowledge base more systematically.

Run Independent Tasks in Independent Contexts (New Sessions)

The progress bar in the bottom-right corner shows how much of the current session’s context has been consumed. Today’s language models can handle long text, but their effective context is still bounded. Once you cross a certain threshold, model quality drops noticeably. Claude 1M and GPT-5.4 both degrade more visibly past roughly 400k tokens. On top of that, longer contexts mean higher cache-read overhead and a higher per-tool-call cost. For better quality and lower cost, split tasks where you can and run them in separate contexts.