What can a Product Manager do when the Product goes down?

Help from the trenches!

On the day Cloudfare went down for a few hours, I can only ask: Do you also feel useless as a Product Manager when the product goes down? Is there something you can do to help?

Every PM knows this feeling. Something breaks, alarms go off, Slack turns into a warzone, and suddenly the whole company looks at engineering and infrastructure to save the day. And you, as the PM, are stuck in this strange limbo. You are responsible for the product, but you cannot fix the product. You can’t SSH into servers, you can’t roll back the deployment, and you can’t magically rewrite the broken code in thirty seconds.

It feels like you are forced to stand on the sidelines and simply observe, hoping not to make anything worse. But that belief is one of the biggest misconceptions about the PM role during an outage. Because even if you touch zero lines of code, you can have a massive influence on how the crisis unfolds, how the team feels, and how fast the organisation recovers.

A PM can’t fix the root cause, but a PM can absolutely fix the chaos. And chaos is often more dangerous than the bug itself.

So let’s break down what your real responsibilities are during a product outage and how you can turn a helpless moment into an example of strong, calm leadership.

1. Protect the team

In a crisis, engineers need a distraction-free environment. They need to think, test, debug, and focus deeply. Every Slack ping steals minutes. Every meeting request burns cognitive energy. Every stakeholder demanding a “quick update” derails progress.

Your job is to become the PR shield.
Block interruptions. Take over external communication. Tell stakeholders that updates will come from you, not from the team. Make sure your developers have a quiet room, figuratively or literally.

The difference between a chaotic outage and a managed one is often measured in how well the PM protects their team’s focus.

2. Establish an update schedule

If you do not define a communication rhythm, the organisation will create one for you, and you won’t like it. You will get random questions every two minutes. Panic will spread. Decisions will be made based on assumptions and fears rather than facts.

Set a simple rule.
Updates every 30 minutes. Or every hour. Whatever is appropriate for the severity. Send them on time. Even if the update is “No change since the last message.”

You are not just sharing information. You are managing expectations. And expectations are the biggest stress factor in a crisis.

3. Estimate the downtime impact

Someone from leadership will inevitably ask:
“How bad is it?”
“How much money are we losing?”
“Who is impacted?”
“What is the estimated damage?”

If you prepare early, you will avoid panic later.

Gather quick back-of-the-envelope numbers.
How many users are in the affected flow?
Which markets?
Which segments?
What is the typical revenue per hour for this area?

Even rough estimates beat silence. And they help leadership stay grounded in numbers, not fear. When senior leaders panic, that panic cascades down onto the team. Your early analysis prevents that.

4. Protect the team’s morale

Crises can break people. Engineers feel guilt. QA feels responsible. Leadership feels pressure. Teams work at speed, and stress builds.

Your job is to be the steady voice in the room. Remind the team that incidents happen even in the best companies. Show trust in the people working on the fix. Avoid passive-aggressive comments and avoid showing frustration.

If you project confidence, calmness, and clarity, your team will follow your lead. And calm engineers debug faster than stressed engineers.

5. Start crafting comms to clients

Engineers fight fires. PMs prepare the narrative.

While the team works, you prepare what will be said once the dust settles. You draft:
What happened?
How long did it last?
How were users impacted?
What steps were taken?
How will you prevent this in the future?

The key is transparency without self-destruction. You want honesty and clarity, not blame and panic.

Write the draft early so you are not scrambling at the last minute. The moment the system recovers, people expect communication to be fast. Better to have it ready.

6. Organize a post-mortem to prevent repetition

An outage that is fixed but not understood is simply waiting to happen again.

You need to schedule a retro as soon as possible. The purpose of the meeting is not to find who messed up. It is to understand what broke, why it broke, how it could have been detected earlier, and what processes or tooling need improvement so the incident never repeats.

Remind everyone that the focus is on learning, not punishment. The only unacceptable post-mortem outcome is doing nothing and hoping for the best.

7. Track the narrative afterwards

An outage does not end when the system goes green again. It ends when the organisation’s confidence recovers.

You need to monitor how sales talks about it, how support frames it. What users say online or in tickets. Whether leadership is telling the story accurately.

If a wrong narrative spreads, even unintentionally, it can damage trust, morale, and customer perception.

Correct misunderstandings early. Shape the narrative intentionally. You are not just managing a crisis. You are managing the memory of a crisis.

Closing thoughts

Crises reveal what kind of Product Manager you are. You can be the anxious observer who waits for someone else to fix everything. Or you can be the calm operator who keeps the room steady, turns chaos into structure, and helps the team navigate uncertainty with confidence.

You cannot fix the servers. But you can fix everything around them. And that is often the difference between a bad incident and a controlled one.

How do you handle product outages when they happen?