Orchestrating Agentic Coding
I had a conversation with a friend recently that crystalized something I’ve been thinking about for months. He was frustrated with agentic coding tools - expected them to excel at straightforward tasks like reorganizing files and fixing imports, but instead found them constantly overengineering solutions and losing focus on the actual problem.
This perfectly captures the current state of AI coding tools. They’re simultaneously impressive and frustrating, powerful and unpredictable. But here’s the thing: I think we’re approaching an inflection point where the real innovation won’t be in making these tools smarter, but in orchestrating them better.
The “Good Enough” Threshold
Forget the stochastic parrot nonsense. These tools are getting close enough to “good enough” that with proper validation, we can consistently rely on them to generate workable products. The key insight isn’t that they’re perfect - it’s that they’re predictably imperfect in ways we can work around.
Think about it like this: if you can spin up multiple AI coding instances, let each tackle the same problem from different angles, and then use deterministic testing to validate the outputs, you’ve essentially created a parallel development pipeline where quality emerges from competition rather than individual genius.
I’m talking about something like spinning up 2-4 Claude Code instances on separate work trees, letting each tackle a problem and open a PR, then using a subagent to evaluate and choose the best implementation. It sounds crazy, but it might be crazy like a fox.
The Real Challenge: Orchestration
The bottleneck isn’t the AI anymore - it’s orchestrating the creation and output of high-quality artifacts from these tools. This is where I think the greatest innovation in agentic coding over the next few years will come from: tooling that takes tribal knowledge shared on Reddit or blogs and turns it into repeatable, scalable processes.
Consider the workflow I just described. Right now, it’s a manual process that requires significant setup and knowledge. But there’s no technical reason it couldn’t be automated:
- Parse a feature request or bug report
- Spin up multiple isolated development environments
- Assign different approaches or constraints to each AI instance
- Let them work in parallel
- Run automated tests against all solutions
- Use another AI to evaluate the trade-offs and pick the winner
- Merge the winning solution
The tricky part isn’t the AI - it’s building the tooling infrastructure to make this workflow reliable and repeatable. You need robust environment isolation, comprehensive test suites, and some way to validate that the output actually solves the intended problem.
Quality Through Competition
This approach fundamentally changes how we think about code quality. Instead of trying to make one AI agent perfect, we create conditions where multiple good-enough agents compete, and quality emerges from that competition.
It’s like genetic algorithms, but for software development. Generate multiple solutions, test them against objective criteria, keep the best ones, and iterate. The individual agents don’t need to be brilliant - they just need to be diverse enough in their approaches that the best solution usually emerges from the pack.
This also solves the “going off in the weeds” problem my friend mentioned. If one agent decides to rewrite your entire codebase when you just wanted to move some files, but another agent does exactly what you asked, the testing phase will reveal which approach actually works.
The Infrastructure Problem
The real challenge is building infrastructure that can support this kind of parallel development workflow. You need:
- Fast environment provisioning (containers, VMs, or similar)
- Robust test automation that can catch regressions and validate functionality
- Code review automation that can identify security issues, performance problems, and maintainability concerns
- Some way to merge and resolve conflicts when multiple agents modify overlapping code
None of this is theoretically difficult, but it requires thinking beyond individual AI capabilities to entire development pipelines. The companies that figure this out first will have a massive advantage.
What This Means for Developers
If this vision pans out, software development starts looking very different. Instead of sitting down to code, you might sit down to orchestrate. Your job becomes defining the problem clearly, setting up the validation criteria, and choosing between the solutions that emerge.
This doesn’t eliminate the need for programming knowledge - if anything, it increases the importance of understanding system architecture, testing strategies, and code quality principles. You need to know what good code looks like to evaluate the options effectively.
But it does change the day-to-day work. Less time writing boilerplate, more time thinking about problems. Less time debugging syntax errors, more time evaluating architectural trade-offs. Less time on implementation details, more time on system design.
The Validation Challenge
The whole approach hinges on being able to validate output in a controlled way. This is harder than it sounds. Unit tests can catch functional regressions, but they can’t tell you if the code is maintainable, secure, or performant under load.
We need better tooling for automated code quality assessment. Static analysis can catch some issues, but we also need dynamic analysis, security scanning, performance profiling, and maybe even AI-powered code review that can spot subtle architectural problems.
The good news is that this validation challenge is solvable with current technology. The techniques exist - they just need to be packaged into workflows that can handle the output of multiple parallel development processes.
The Next Three Years
I predict we’ll see a wave of tools that treat AI coding agents as commodities to be orchestrated rather than oracles to be consulted. Companies will start building “AI development factories” that can spin up multiple agents, test their outputs, and deliver validated solutions.
The developers who get ahead of this curve will be the ones who start thinking about coding problems as orchestration problems. How do you define requirements clearly enough that multiple AI agents can work on them independently? How do you build test suites comprehensive enough to validate AI-generated code? How do you evaluate trade-offs between different solution approaches?
These are the skills that will matter as agentic coding tools cross the “good enough” threshold. The future belongs to developers who can conduct an orchestra of AI agents, not those who insist on playing every instrument themselves.
It’s going to be a wild ride.