A few days ago, a post on r/ClaudeAI captured something I’ve been thinking about for weeks. A developer wrote that Opus 4.5 was “the first model that makes me actually fear for my job.” Nearly a thousand upvotes. Hundreds of comments from engineers asking the same question.

I work on GitHub Copilot. I watch millions of developers interact with AI coding tools every day. And I can tell you: the fear is pointed in the wrong direction.

The developers struggling aren’t the ones who write less code. They’re the ones who can’t tell good code from bad code when they see it. The bottleneck was never typing speed. It was always judgement.

For years, engineering meant spending most of your time producing syntax. Now that LLMs handle that part faster than I ever could, I have more time for what engineering was always supposed to be: designing systems, making tradeoffs, catching subtle bugs before they hit production. The AI amplifies my experience rather than replacing it. In this post, I’ll share what I actually do to get that amplification: the patterns I look for, the tools I wire up, and the mental shifts that make the difference between using AI as a toy and using it as a genuine force multiplier.


The Mental Model Shift

When I started my career, I measured productivity in lines written. Then in features shipped. Now I measure it in decisions validated per hour.

Here’s the business reality: a task that once took a day now takes an hour, but only if I can verify the output quickly. The bottleneck has moved from implementation to validation. Engineers who validate fast ship more. Shipping more means more value per sprint, per quarter, per year. That’s the ROI equation now.

Let me show you what I mean with a real example. Last month I needed a rate limiter for a service handling webhook deliveries. Old me would have written it from scratch: maybe 200 lines of Go, a few hours of work. Instead, I described what I needed:

“Token bucket rate limiter in Go. Per-tenant limits stored in Redis. Graceful degradation if Redis is down. Return headers showing remaining quota.”

Copilot generated 180 lines in seconds. And here’s where the actual engineering happened, the part that took me 10 minutes but saved me from a production incident:

func (r *RateLimiter) Allow(ctx context.Context, tenantID string) (bool, error) {
    key := fmt.Sprintf("ratelimit:%s", tenantID)
    
    tokens, err := r.redis.DecrBy(ctx, key, 1).Result()
    if err != nil {
        // AI wrote: return false, fmt.Errorf("redis error: %w", err)
        // 
        // I changed it to this. Why? Webhook delivery is already async with 
        // downstream retries. Blocking paying customers because Redis hiccuped 
        // is worse than occasionally over-delivering. Log it, alert on it, 
        // but don't fail the request.
        r.metrics.RedisErrorCount.Inc()
        r.logger.Warn("rate limiter redis unavailable, failing open",
            zap.String("tenant_id", tenantID),
            zap.Error(err),
        )
        return true, nil
    }
    
    if tokens < 0 {
        return false, nil
    }
    return true, nil
}

The AI defaulted to fail-closed with a wrapped error. That’s the “safe” choice in a vacuum. But I know our system. That judgement came from years of operating it, context no model has.

This is the shift: not that we stop writing code, but that we spend less time on the first draft and more time on the critical decisions that make it production-ready.

Whether you work on backend services, frontend applications, or infrastructure, the pattern is the same. Your years of experience, your ability to reason about edge cases, your intuition for what will break at scale: none of that gets replaced. It gets freed up. The AI handles the syntax you’d have typed anyway. You bring the thinking that makes it actually work.


What I Actually Look For

After reviewing thousands of AI-generated diffs, I’ve developed a mental checklist. These are the patterns I’ve trained myself to spot instantly:

1. Error Handling That Looks Right But Isn’t

AI loves to generate this:

result, err := doSomething()
if err != nil {
    return fmt.Errorf("failed to do something: %w", err)
}

Looks correct. Compiles. But I immediately ask:

  • Is this error actionable by the caller? If not, should we log and continue?
  • Are we leaking sensitive info in the error message? (tenant IDs, internal paths)
  • Should this be a different error type? (retriable vs permanent)

The AI doesn’t know that in our codebase, we have custom error types that trigger different retry behaviour. It doesn’t know that ErrTemporary gets exponential backoff while ErrPermanent fails fast. (I am talking hypothetical types here, but you get the idea.)

This is where CLAUDE.md/AGENTS.md files become essential. These are markdown files in your repo root that teach AI assistants your codebase conventions. Here’s the kind of thing that belongs in one:

## Error Handling Conventions
- Wrap retriable errors (network, rate limits) with a marker type
- Wrap permanent errors (validation, auth) differently so callers can distinguish
- Never include sensitive data in error messages (tokens, internal IDs)
- Add structured logging context at the point of error creation

## Existing Utilities — Use These Instead of Writing New Ones
- Mock generation: we use `go generate` with mockgen, don't hand-write mocks
- HTTP clients: use the shared client in `pkg/http` with built-in tracing
- Database access: generated code from sqlc, don't write raw queries

The specific contents will vary by codebase, but the pattern is the same: document your conventions explicitly so the AI follows them.

Without this, I watch AI generate beautiful hand-written mocks when we have mockgen, or create new HTTP utilities when we have a perfectly good instrumented client. Claude Opus 4.5 is not lazy. It will happily write 500 lines of code that duplicates existing functionality. Teaching it your codebase’s conventions saves hours of “thanks but we already have this” reviews.

2. The Abstraction Question

There’s a fascinating debate happening right now about whether AI changes the calculus on abstractions. Adam Wathan recently asked : “Did we only ever create abstractions to avoid duplication, and now is maintaining duplication actually not a problem?”

The context: Lee Robinson at Cursor migrated cursor.com from a CMS to raw code and Markdown in three days with $260 in tokens. Hundreds of AI agents doing bulk edits across duplicated code.

I’ll be direct: I think this is dangerous thinking for production systems.

Yes, AI can find and update 50 copies of similar code. But you’re not always going to have an AI agent running bulk edits. You’ll make a quick fix at 2 AM during an incident. A new team member will change one copy and not realise there are 49 others. The abstraction wasn’t just about avoiding tedious edits. It was about making the codebase correct by construction.

Best practices exist because they encode decades of hard-won lessons about what goes wrong in long-lived systems. DRY, separation of concerns, encapsulation: these aren’t arbitrary preferences. They’re load-bearing walls.

Where I do think AI changes things: the threshold for when to abstract can shift slightly. If you’re prototyping, if the code is genuinely throwaway, if you’re building something where velocity matters more than maintainability, fine, let duplication ride. But for production systems that will be maintained for years? The fundamentals haven’t changed. Abstract your business logic. Encapsulate your security boundaries. Don’t scatter the same validation rules across 50 files because “the AI can update them all.”

The AI is a tool for working with your codebase, not a substitute for good architecture.

3. Concurrency Patterns That Pass Code Review But Fail at 3 AM

This is where programming experience really matters. Let’s see an example in Go. The AI generates:

var results []Result
var mu sync.Mutex
var wg sync.WaitGroup

for _, item := range items {
    wg.Add(1)
    go func(item Item) {
        defer wg.Done()
        result := process(item)
        mu.Lock()
        results = append(results, result)
        mu.Unlock()
    }(item)
}
wg.Wait()

Fine for a coding exercise. But I see:

  • Unbounded goroutines. 10,000 items = 10,000 goroutines. Use a worker pool.
  • No context cancellation. If the parent request times out, these keep running.
  • No error propagation. If process() fails, we silently drop it.

Here’s what I actually want:

func processAll(ctx context.Context, items []Item) ([]Result, error) {
    g, ctx := errgroup.WithContext(ctx)
    g.SetLimit(10) // bounded concurrency
    
    results := make([]Result, len(items))
    for i, item := range items {
        i, item := i, item
        g.Go(func() error {
            result, err := process(ctx, item)
            if err != nil {
                return fmt.Errorf("item %d: %w", i, err)
            }
            results[i] = result // no mutex needed - distinct indices
            return nil
        })
    }
    
    if err := g.Wait(); err != nil {
        return nil, err
    }
    return results, nil
}

The AI can generate this if you prompt it correctly. But knowing to ask for errgroup, bounded concurrency, and context propagation comes from having debugged goroutine leaks at 3 AM.

4. The Wrong Package Problem

AI models sometimes pick the wrong libraries. Not hallucinated ones, but real packages that aren’t the right choice.

Last week I was building a quick Stripe integration for a side project. The AI pulled in github.com/stripe/stripe-go/v72, which exists, compiles, and technically works. But the current version is v84. Twelve major versions behind. The API surface has changed significantly, and I’d be building on deprecated patterns.

Similarly, for an OpenAI integration, the AI confidently used github.com/sashabaranov/go-openai, a popular community package, instead of github.com/openai/openai-go, the official SDK released later. The community package is fine, but if you’re starting fresh, you probably want the official one that’ll stay in sync with API changes.

The lesson: AI knows what exists, but not always what you should use. It’s trained on historical code, which means it learned from projects using whatever was popular at the time. You need to know:

  • Which packages are actively maintained vs abandoned
  • When official SDKs exist vs community alternatives
  • What version is current, not just what version appears most in training data

I keep a mental (and sometimes literal) list of “blessed” dependencies for common tasks. When the AI suggests something else, I pause and ask why.

5. The “Works on My Machine” Configuration

AI-generated code often assumes a happy path that doesn’t exist in production environments.

I’ve seen AI generate database connection code that works perfectly in development but falls over in production because it doesn’t handle connection pooling properly, or assumes localhost latency, or doesn’t account for the connection limits on a managed database. It’ll write code that reads from environment variables without considering what happens when they’re missing. It’ll set up logging that writes to stdout, which is fine locally but needs structured JSON for your log aggregator in production.

The pattern: AI knows how to write code that runs. It doesn’t know the difference between your local Docker Compose setup and your production Kubernetes cluster with its service mesh, secrets management, and network policies. That operational context, the “where and how this actually runs” knowledge, is still yours to bring.


MCP: The Real Unlock

Here’s something that’s become clear from watching how developers use Copilot: code generation is table stakes. The real productivity unlock is connecting the model to your actual context.

Right now, most developers use AI like a very smart autocomplete that’s read the internet. Useful, but limited. The AI doesn’t know:

  • Your database schema
  • Your internal API contracts
  • Your deployment configuration
  • Your team’s architectural decisions
  • The contents of that Notion doc explaining why you made that weird tradeoff

Model Context Protocol (MCP) changes this. It’s a standard way to give AI models access to external data sources and tools. And it’s going to be the difference between “AI that helps” and “AI that actually knows your system.”

Example: Architecture Decisions via GitHub MCP

Here’s something I’ve been experimenting with using the GitHub MCP server .

Imagine your team keeps architectural decisions in GitHub Discussions or in ADR files in the repo, context like “why we chose Postgres over CockroachDB” or “the tradeoffs we made on the event schema.” Normally, this context is invisible to the AI. You’d have to copy-paste relevant discussions into every prompt.

With MCP, you can prompt something like:

“Search through the discussions and ADRs in this repo. I need to add a new event type for subscription changes. Based on our existing patterns and any relevant decisions, draft the event schema and the handler.”

The AI can now pull the actual discussions, see that the team standardised on CloudEvents format, notice there’s an ADR about avoiding nested payloads, and generate something that fits. Instead of you re-explaining context every time, the model reads the institutional knowledge directly.

The same pattern works with other MCP servers:

  • Postgres MCP: Query your actual schema instead of describing it in prose
  • Sentry MCP: “What errors has this endpoint thrown in the last week?”
  • Notion/Confluence MCP: “What does our onboarding doc say about the auth flow?”

The engineers who wire up these context sources will dramatically out-leverage those who keep copy-pasting documentation into prompts.

Beyond Context: AI-Runnable Environments

Here’s where I think this is heading next. Giving AI read access to your codebase is step one. But the real unlock is giving it the ability to run things.

Think about how you debug a problem. You don’t just read the code. You run it. You check the logs. You reproduce the error. You add some print statements and try again. AI can do the same, if you let it.

This means building what I’d call an “AI dev environment”: a sandboxed setup where an AI agent can execute tests, query log output, reproduce failures, inspect UI, and iterate on fixes. Just like you’d set up a dev environment for a new team member, you set one up for the AI.

What this looks like in practice:

  • Runnable test suites. The AI can run go test ./... and see what fails, not just guess from reading the code.
  • Accessible logs. Point the AI at your structured logs or tracing system so it can see what actually happened, not what the code implies should happen.
  • Reproducible environments. Containerised setups, seed data, the ability to spin up dependencies. If a human can reproduce a bug locally, the AI should be able to as well.
  • Feedback loops. The AI makes a change, runs the tests, sees the result, adjusts. This is how humans work. It’s how AI should work too.

The gap today is that most AI coding happens in a “write-only” mode: generate code, hand it to a human, hope it works. The teams getting serious leverage are closing that loop. They’re letting the AI verify its own output, see its own mistakes, and iterate without human babysitting on every cycle.

If you’re setting up infrastructure for AI-assisted development, think beyond “what can the AI read?” and start asking “what can the AI run?”


The Evaluation Gap

Here’s something I see constantly: teams unable to agree on whether AI-generated code is “good enough.” Arguments go in circles because there’s no shared criteria for quality.

You need to measure. This is the most underrated skill in AI-augmented development.

Writing good evals is hard, harder than most people expect. A few things I’ve learned:

Start with your code review checklist. Whatever you already check manually is your first eval. “Does it handle context cancellation?” becomes a grep for ctx.Done() or ctx.Err(). “Does it use our error types?” becomes a check for apperrors. imports.

Test behaviour, not just presence. It’s easy to write evals that check “does the code contain X.” It’s harder but more valuable to check “does the code correctly use X.” Run the generated code through your existing test suite. Use go vet, staticcheck, your linters.

Capture regressions explicitly. When you find a prompt that produces bad output, add it to your eval suite. Over time you build a corpus of “things that went wrong” that you can test against automatically.

Measure what matters to the business. The ultimate eval is: did this generated code work in production? Track which AI-generated PRs get reverted, cause incidents, or require significant rework. That’s your ground truth.

The team that can measure quality can improve it. Everyone else is debating opinions.


What NOT to Learn in 2026

Here’s where I’ll be contrarian. Some things that feel productive are actually time sinks:

1. Memorising API Surfaces

Stop grinding LeetCode-style memorisation of standard library functions. The AI knows every function signature in every language. Your job is knowing when to use which pattern, not recalling the exact argument order.

Instead: Understand the concepts deeply. Know what a context is for, not just how to pass one.

2. Framework-Specific Syntax

The half-life of framework knowledge is getting shorter. That fancy new React hook or Kubernetes annotation you learned? It’ll be deprecated or replaced soon enough, and the AI will know the new syntax better than you.

Instead: Understand the underlying problems frameworks solve. HTTP semantics outlast Express.js.

3. “Prompt Engineering” Tricks

The “add ’think step by step’” crowd is chasing diminishing returns. Yes, prompts matter. No, spending hours optimising prompt wording is not high leverage.

Instead: Focus on giving the model better context, not cleverer instructions. Show it your actual code, your actual constraints, your actual error messages.

4. Chasing Every New Model

New model drops every week. Each one is “revolutionary.” If you spend your time A/B testing Claude vs GPT vs Gemini for your todo app, you’re optimising the wrong thing.

Instead: Pick a good-enough model and invest in evaluation harnesses. When a genuinely better model drops, your evals will tell you.

5. Treating AI Output as a Starting Point to “Clean Up”

I see this pattern constantly: developers prompt the AI, get a big blob of code, then spend an hour manually refactoring it into something acceptable. This is backwards.

Instead: Iterate with the AI. If the first output isn’t right, refine the prompt. Add constraints. Show it an example of what you want. The goal is output you can accept with minimal changes, not output you treat as a rough draft to rewrite.

6. Building Custom AI Tooling Before You Need It

It’s tempting to build elaborate wrapper scripts, custom prompt libraries, or internal “AI platforms” before you’ve validated the basic workflow. I’ve seen teams spend months building infrastructure for AI-assisted development that nobody uses because the vanilla tools would have been fine.

Instead: Start with off-the-shelf tools. Use Claude Code, Cursor, Copilot as they come. Only build custom tooling when you hit a real wall, not an imagined one. The landscape is moving too fast to invest heavily in infrastructure that might be obsolete in six months.

7. Neglecting Your Core Engineering Skills

This one is counterintuitive: some engineers are spending so much time learning “AI skills” that they’re letting their core engineering knowledge atrophy. But the whole point of AI leverage is that it amplifies existing expertise. If you don’t understand distributed systems, the AI can’t make you good at designing them. If you don’t understand security, the AI will generate insecure code and you won’t notice.

Instead: Keep investing in fundamentals. Read the papers. Debug the hard problems. Understand why things work, not just how to prompt for them. The engineers who will thrive are the ones with deep expertise that the AI can amplify, not shallow knowledge that the AI will replace.


The Mindset I’m Adopting

Here’s how I think about my work now, coming from both using these tools daily and building them:

I am a software engineer, and my job extends beyond writing syntax. I define what “correct” means for our system, validate that we’re achieving it, and catch the cases where we’re not.

I am a context curator. The model is only as good as the context I give it. My knowledge of our systems, our constraints, our history: that’s my leverage. Wiring up MCP servers, writing CLAUDE.md files, keeping architecture docs current. This is engineering work now.

I am an architecture owner. The AI can fill in implementations, but the structure, the boundaries, the tradeoffs: those are mine to define and defend.

I am a taste-maker. The AI generates options. I choose which options are good. That judgement, knowing what production-ready looks like, is the whole game.

The developers who fear replacement are often the ones who’ve defined their value as “the person who writes the code.” But that was never really the value. The value was always “the person who knows what code to write, and whether the code that exists is correct.”

That job isn’t going away. It’s getting amplified.


Practical Next Steps

If you’re an engineer looking to position for 2026:

This month:

  1. Create a CLAUDE.md or AGENTS.md for your main repo. Document your conventions, your existing utilities, your error handling patterns. See how it changes the AI’s output.
  2. Set up one MCP server (GitHub, your database, whatever gives the most context) and try prompting with it.

This quarter:

  1. Document your mental checklist for code review. What do you look for that others miss? Make it explicit, then see what you can turn into automated checks.
  2. Build a simple eval for one workflow you do repeatedly. Start measuring quality instead of guessing.

This year:

  1. Invest in understanding distributed systems deeply. Consensus, failure modes, observability: the stuff that doesn’t fit in a prompt and requires real expertise.
  2. Get good at one domain that requires judgement and context: security, performance, reliability. These compound with AI rather than competing with it.

The engineers who will thrive aren’t the ones who type fastest or prompt cleverest. They’re the ones who’ve systematised their judgement, who can validate AI output as fast as it’s generated, who can connect AI to the context that makes it actually useful, and who can measure quality instead of arguing about opinions.

That’s the game now. Time to get good at it.