How I AI: Coding in Late 2025
AI has been evolving rapidly, and for some time now, I’ve been using it as both a writing and coding assistant. In 2024, I experimented briefly with agentic coding through Claude Desktop when the MCP protocol emerged, but aside from Anthropic’s highest-tier models, I struggled to achieve meaningful results on my daily projects (~2M tokens in size). Everything shifted this summer with the arrival of codex-cli in May 2025, which finally provided a capable model at a reasonable price. The release of GPT-5 and its subsequent refinements only broadened the freedom I granted the agent. The most significant change, however, came in October 2025, when I tried Cursor for three weeks and focused on maximizing the value of my 500 credits.
As 2025 comes to an end, I wanted to share a snapshot of how I use AI today.
Choosing a competent model
First, let’s start with the model. It is very important for me to work with a competent model:
- Follows instructions: While th model’s context isn’t full, it won’t discard any instructions.
- Is proficient at tool use: when it’s provided with an MCP, it can use it without too much struggle.
- Isn’t lazy: when asked to come up with at least 1 option, it naturally produces serveral.
- Has enough thinking capacity to grasp the task at hand.
This narrows my choices to two model families: OpenAI’s GPT-5 and Anthropic’s Claude. My daily driver has been GPT-5 and, more recently, GPT-5.1-codex-max. Although Opus 4.5 is excellent, I find OpenAI’s pricing more reasonable — and that matters when you live in France. These features, combined with clear prompts and careful attention to context, make hallucinations extremely rare and allow me to prompt without being overly specific to hedge against half-assing tasks.
How I collaborate with agents
This is how I see agent collaboration so far:
- The agent should be able to work independently (without prompting me to validate commands) as much as possible.
- For any task, most of the cognitive load must be handled by the agent: the prompt must contain a way for it to check the quality of its work — tests to be run, static checkers, formatters, etc. When this technique is used with competent models, it results in longer execution time that leads to better results. The model spends more time correcting its mistakes and iterating on its solutions. In the end, I’m provided with working code or an explanation of why it failed to reach its goal.
- After each prompt, I look at the output: First, I read the summary provided by the agent. Second, I check that they ran the tests. Third, I check the overview of files changed. And finally, the content of the change. If all checks out, I commit the changes to the repo. Otherwise, I either keep iterating or clear the changes and try again in a fresh session.
To keep agents effective and safe, I use environment constraints in addition to instructions:
- Read-only access to git: the agent must stay on the current git branch. It may explore other branches and git history, but it cannot commit, stash, push, etc.
- Pre-installed execution environment with little to no internet access.
- Plain and minimal project-level instructions. These include guidance on how to execute code in the correct environment and actions required after editing certain files (
ruff+blackfor*.py, adding files to an Xcode project for Swift, etc.). Importantly, that numbered lists are preferred over bullet points to simplify option selection and follow-up questions. - Theme-specific guidelines reused in prompts, such as
style.md,testing.md, etc.
I iterate on my prompts in my notes app while using them. It helps me keep track of my prompting history and experiment with different versions. I use Notion to organize these prompts in a small database with a title and some metadata such as the project name, the type of prompt, or the target model. It helps me reuse prompts and see patterns that I can extract into more generic prompts or guideline documents.
My 3 work loops
Agent loop: think → prompt → review → commit
This is where I have been spending +60% of my coding time lately. When it works well, it’s exhilarating. At first, I struggled with the idleness built into this loop while waiting for the agent to finish its work, but I eventually started launching other agents on different projects or on additional clones of the same one. I need at least two agents running in this loop to feel productive, and four is my sweet spot. That’s usually when I hit my flow state. The prompting phase sometimes includes generating documentation that I reference within the prompts. I also use this loop for brainstorming architecture changes. An iteration can last anywhere from a few minutes to about an hour, though 20 minutes feels like a good average.
Coding loop: think → code → run
This is my standard coding process — slow, steady, and reliable. After thousands of hours, it’s become so ingrained that I often use it to think through problems. I still use this loop to understand novel problems or to set the stage before going into the agent loop. I enjoy writing utilities the model can use to detect common mistakes or defects. It’s a skill I’ve honed through years of building internal tools meant to shorten feedback loops and provide clear errors and instructions. I spend a bit over 30% of my coding time here.
Co-debugging loop: prompt → review → approve
This is my least favorite loop: In this situation, the model is interacting with parts of the environment that I can’t trust it to touch on its own — usually while debugging a technical issue. I have to review every command and approve them one by one. I read the model’s reasoning, check each output, and essentially co-debug the issue with the model driving. Still, it’s better than the older approach of running every command manually, copying the output back to the chat, and asking for the next step. At least here the model has supervised access to the environment. I barely spend any time in this loop — usually less than 1%.
Prompting
When prompting, I treat LLMs as very competent, anxious, single-focused and eager-to-please junior collaborators. I try to be clear and concise. I use inclusive language, keep my tone kind and positive, say please and thank you, and provide positive feedback within the same chat session. The only difference is that I discard the “explaining for learning” part of human-to-human communications. Instead, I provide rules: must do X, should do Y, etc. There might be a more effective communication strategy, but I think that if I’m spending so much time communicating with LLMs, I should use that time to reinforce my own communication style.
My long-running coding prompts usually follow this structure:
- General context: We are in the process of making this change / Our goal is to do X.
- Task: Your mission is to do Y.
- Process: start by reading X, Y, Z to learn more about the problem, then solve the problem.
- Definition of done: no changes in X files / only changes in X files, all of the tests in X file or folder succeed, etc.
Dev Environment
My tools for coding are codex-cli and Cursor on M1 Macbook Pro. I use devcontainers to provide an isolated environment for each agent, and I use multiple local clones of each project to perform parallel work.
- Favorite models:
gpt-5.1-codex-max (medium) - devcontainer-cli: used to run codex in
—-yolomode with search enabled, routed through a Squid proxy that restricts internet traffic to OpenAI’s servers. - Notify hook: provides sound notifications whenever codex-cli needs my approval or completes a task.
- execpolicy: I created a rule that restricts git to read-only mode. Before using this tool, I relied on a custom wrapper that forwarded filtered commands to the real git binary — functional but clunky.
- code-index mcp: a lightweight index that helps agents navigate code more reliably without IDE index access.
- Additional binaries that imporove environment interaction:
ripgrep,fd,jq,fzf,ast-grep,universal-ctags.
Cursor
- Favorite models:
gpt-5-codex high,gpt-5.1-codex low,opus-4.5-thinking - DevContainer integration
- Notifications: sound and toolbar alerts help me know precisely when my attention is needed. I hear the sound, check the completed tasks in the toolbar, and click the one I want to review. The correct desktop appears, and I begin the review.
- Whitelisted commands: a growing set of safe commands that lets agents operate independently in their devcontainer with read-only git access.
- Agent Interface: I appreciate the simplicity of this view during the agent loop. While the agent is running, I keep only the chat panel open; during review, I switch panels quickly using shortcuts. This makes it easy to run tests or jump to files, reducing cognitive load.
- I disabled AI driven auto-complete, such as Cursor Tabs, in my IDE. I’m either in the coding loop, in this case I don’t want any distractions. Or I am in agent mode where the AI is making the changes.
- A new flow state unlocked using Cursor: Batching 4 tasks at a time using 4 Cursor IDE in Agent Interface with Sound notification and two split virtual desktops on MacOS with a 32” display.
I’m experimenting with lightweight MCPs to solve recurring problems. My tool of choice is fastmcp and I usually generate them using codex-cli.
Areas for exploration
- Sub agent workflows using codex-cli with a small mcp that wraps other instances of codex-cli.
- Cursor Cloud agents: I’d love for non-technical collaborators to fire their tasks from Slack when investigating a problem, or the ability to perform the agent loop on my phone. So far, I can’t trust an agent’s output if they have no way of testing that output. My current blocker has been difficulty debugging Dockerfile-based execution environments in Cursor.
- Git worktrees: not the most bang-for-the-buck investment right now since my current setup is working for me. But I can see that it’s a new Git workflow that is taking the industry by storm and I need to see if it would make me more effective.
- Spec Kit: my initial test on a 2M+ token-sized project failed miserably. The process was neither fun nor effective and I ended up throwing the code away after two afternoons of agent loops attempting to clean it. Still, it allowed me to find documentation opportunities and I ended up accelerating some feedback loops when dealing with data migrations.
- I might give it another try once I get a working sub-agent workflow with a supervisor agent firing sub-agents to perform individual tasks and checking the result.
- My only beef with Spec Kit is that it feels like a waterfall-style development process. I tend to approach tasks iteratively with a learning step at each iteration and I think my agents should follow the same approach.
Ressources
Books I recommend reading: