Troubleshooting As A TPM In An AI Future

2025.17 - 6th Post - To prepare for a future with fewer TPMs, we need better ways to measure how close that future is so we can start to figure out what is left for us to do, and hone those aspects.

Jun 13, 2025

This is the 6th post in a series where I’m pondering out loud in public about how AI might reshape or perhaps, in my opinion, replace the Technical Program Manager role.

So far, we’ve looked at how TPMs process messy inputs, synthesize signal from noise, and distribute that signal to the right people in the right way and last week we dove into establishing accountability across programs. With each function, AI is starting to take on more of the heavy lifting.

But this week, I want to talk about something less structured, more instinctive, and much harder to teach or automate: troubleshooting.

Let’s dig in.

What Does Troubleshooting Actually Mean?

Troubleshooting is the TPM mode we enter when nothing is working as expected and no one can quite say why.

It can take many shapes or forms but the most common manifestation can look like:

Why is this Jira ticket untouched for two weeks despite being a blocker?
Why are we missing milestone targets even though every team says they’re on track?
Why are designs still unclear even though mocks were distributed and discussed?
Why does this system keep failing in staging but pass in local builds?

The source of the issue might be technical, interpersonal, process-related, or political. Often, it’s a mix of all of the above.

Troubleshooting is part detective work, part therapist, part systems engineer. And it doesn’t just apply to projects, it applies across everything else we’ve talked about:

Troubleshooting process when the inputs stop making sense
Troubleshooting synthesis when the signals don’t align
Troubleshooting distribution when your carefully crafted summary creates more confusion than clarity
Troubleshooting accountability when you can’t tell who actually owns what

It’s the invisible connective tissue of the TPM craft. And it’s where our instincts and our “human skills” tend to show up the most. This is why organizations bring in TPM. This function is the value the TPM role brings along with many other things. BUT it is also the hardest thing to teach because it requires understanding when you need to switch from navigation → diagnostic mode.

The Two Modes of TPM Work: Navigation vs. Diagnosis

Think about what you do on a day to day basis as a TPM. In most of our work, we’re in navigation mode: moving toward a known destination, managing trade-offs, tracking timelines, and nudging execution along a planned path.

Troubleshooting, on the other hand, puts us in diagnostic mode. There’s no map. You’re not following a path but rather you’re feeling for it. You’re pulling on threads, running thought experiments, asking questions, and trying to find the rough edges no one else sees or is considering. It’s not always a rational process. Sometimes your gut tells you something’s off before the data does. On rare occasions the data will never show you the problem until it's too late.

And over time, that “gut” becomes pattern recognition. You’ve seen this dance before. You remember the sound of that silence in Slack channel. You recognize when “still investigating” week over week from an engineer really means, “I’m stuck but hesitant to ask for help.”

🤔 AI might be able to track metrics. But can it sense hesitation? Can it recognize emotional friction behind a stuck status update?

What AI Might Get Right

To be clear, I think AI agents will become useful here. In fact, some already are. Here’s where I think they can help:

Observability: Surfacing anomalies in task velocity, silent channels, or regressions.
Pattern matching: “This delay looks like sprint 12 last quarter.”
Root cause hypotheses: “Three systems share this dependency and have recent changes.”
Workflow nudges: “This ticket is overdue and hasn’t been commented on in five days.”

These tools can give you more visibility faster, sooner, more cleanly. They might even prevent problems from growing large enough to need full-scale troubleshooting. That’s a win. But visibility isn’t the same as understanding. And surfacing an anomaly isn’t the same as knowing what to do next.

Sometimes a signal is a ghost in the machine. A red herring where you will burn mental space chasing an issue that isn’t even a concern or risk. Its noise.

🤔 Does an AI agent know the value of the signal? Does it know this signal is false without a pattern of what is true or not?

What AI Will Struggle With

Troubleshooting is often deeply human because what breaks isn’t always systems, it’s relationships, expectations, or trust. The human breaks.

Here’s where I see AI still struggle with:

Passive resistance: A team technically moving, but emotionally disengaged.
Misaligned goals: A PM who agrees to the plan but is secretly shifting priorities.
Leadership pressure: A director pushing one outcome while teams quietly drift another way.
Overconfidence in automation: Everything looks on track because the agent says it is.

AI doesn’t know how to hold discomfort. It doesn’t feel tension. It can flag that something is off but it can’t always interpret why. Its look for statistical signals.

As TPMs, we often troubleshoot not by running reports, but by poking around, asking questions, drawing things out, sensing the emotions on the project; we are looking for noises in the machine that are foreign, or “wrong”. Sometimes we write things down just to understand them better ourselves. Sometimes we don't wait for data. We follow doubt.

The Risk of “Over-Automating” Our Way Past Trouble

Here’s where I worry a bit.

AI agents will enable more work with fewer people. On paper, that is a win for many organizations worried about P&L. It all sunshine and rainbows until something breaks. If fewer people understand the full system, who’s left to troubleshoot?

Imagine a future where you escalate not by asking for more headcount, but for more agents. “Can we spin up two more AI contributors to fix this faster?” Sounds efficient until those agents confidently do the wrong thing at scale.

In that world, who’s accountable?
And more importantly: Who notices the problem before it spirals?

Troubleshooting will become more important as AI accelerates work. Because when things go sideways at speed, we’ll need faster sensemakers, not just faster doers.

A Habit That’s Helped Me: Write While You Debug

One underrated trick that’s helped me is writing while I troubleshoot. Don’t wait until after. We often save the writing for postmortems. But the act of capturing questions, tracking hypotheses, and drawing decision trees mid-chaos sharpens clarity. It’s like taking notes during an experiment.

Part of the power is in your writing it. The very act of articulating your questions often brings new clarity.

🤔 Could AI help with that? Maybe. Maybe it is the assistant to the assistant writing the notes as they come. Or will this writing be helpful to train your AI agent.

So, What Does Troubleshooting Look Like For TPMs In An AI Future?

If AI agents can see signals faster, maybe the TPM of the future won’t be the first to know when something is off. Maybe we are still the only one who knows what it means. The only one who can connect the strange Slack silence, the vague update, the pattern of drift.

The gut feeling that something is off that isn’t in the data, it is on paper or in written form but in the ether, in the environment, in the moment, in zoom call full of smart people saying “all on track”.

This sense detection is still a deeply human skill. And maybe one of the last to go.

Final Word

Troubleshooting comes with experience. Experience comes with doing more complex things over long periods of time. Doing complex things requires you first start by embarking on small adventures, a tiny feature here, an architectural upgrade program or something limited in scope.

If the these opportunities to learn are going away and replaced by AI Agents then how do TPMs hone their ability to troubleshoot. We see this more and more as a concern that people are farming off critical thinking to tools like ChatGPT.

Maybe, the first thing we lose is our ability to troubleshoot.

Or maybe, it’s too much doom and gloom for one essay =).

Until next time.

-Aadil

Reader Self-Audit: How Do You Troubleshoot?
Before we wrap, I have some questions for you to reflect on:
Do you find yourself in navigation or diagnostic mode more often these days?
What signals do you rely on to sense something’s off?
When was the last time your gut told you there was a problem and was it right?
Do you document your thinking while you troubleshoot, or only after?
Could you imagine an AI agent troubleshooting your project? Would you trust its conclusions?
I’d love to hear your thoughts. Let’s keep thinking together.

The Art of Doing Technical Program Management

Discussion about this post