2 AI+Backlog Pitfalls to Watch Out For

The other day, I was talking with a technology leader who’s responsible for a portfolio of internal business applications. He and his teams deal with a large group of stakeholders, and those stakeholders want a lot of different things.

Historically, stakeholder requests have been, to put it kindly, half-baked. Sometimes good ideas but not really thought through. Other times, lots of energy around ideas that should never see the light of day.

This leader saw an opportunity. He spent 6 months on a side project to create and tune a custom GPT to help stakeholders think through their ideas. It asks them questions to help them get their own thoughts clear before they bring requests to IT.

Everybody’s loving the result. Stakeholders get to show up with clearer requests that are more likely to be implemented, and technology people get to focus more on work that really matters, without wasted back-and-forth building the wrong thing and rebuilding it as stakeholders discover what they actually want.

Is this removing the complexity? Is it eliminating the need for experimentation and iteration? Not completely. But it’s bringing the focus to complexity that’s inherent to the problems they’re solving, rather than complexity created from unclear thinking.

In my opinion, this is still a place where a skilled PM can be even more effective than the AI. But if you don’t have that skilled PM, this looks to me like a nice application of AI.

On the other hand, there are two other ways I’m seeing companies run into problems with AI in their backlog creation and refinement.

AI Backlog Failure Mode #1: LLM as Intermediary

The first issue is structural.

We’ve written and podcasted before about the challenges that come from splitting the end-to-end Product Owner role between two or more people. Most often, there’s someone with a Product Manager title talking to customers and thinking about product strategy and someone with a Product Owner title working on the tactical part of the backlog and collaborating with teams to implement backlog items.

Without going into all the details again, this tends to lead to greater distance between customers and developers, to most information lost at the handoffs between roles, and to longer feedback loops between an idea and data.

Now, companies are starting to notice that the extra layer isn’t particularly helpful. And the tactical part of the role, in particular, struggles to add unique value. But instead of making the PO or PM an end-to-end role again, collaborating with both customers and developers, the shiny, new option is: What if we take the tactical part of the role and just replace it with an LLM?

So, the PM has an idea, maybe some customer data, and they feed it to a tool like ChatGPT to generate a document. Whether you call it an epic or a PRD, it’s a lot of detail extrapolated from the PM’s idea. They hand that over to developers, who then ask another LLM to break down the work into user stories or tasks. (BTW, the same thing can happen with a vide-coded prototype as an intermediary, though it doesn’t have to—more on that later.)

It’s the same telephone game as before, but now we’re playing telephone with somebody who doesn’t have context-specific knowledge, domain experience, or critical thinking.

(Product management isn’t the only place where AI telephone is happening, by the way. We see marketers using LLMs to generate newsletter content and their readers using LLMs to summarize the email. But at least that expression of it doesn’t lead to millions of dollars of development effort based on the output.)

AI Backlog Failure Mode #2: Mistaking Detail for Truth

The second issue is psychological. You might have either or both of these.

Here’s the psychological problem: Our brains have cognitive biases, shortcuts that help us operate in the world and preserve energy but that sometimes mislead us.

One of these cognitive biases is that our brains use detail or precision as a subconscious signal for accuracy or importance. People drive slower in a parking lot with a 14 mph speed limit sign than in one with a 10 mph sign. The 14 signals importance. Drivers don’t think about it consciously, they just drive slower.

We run into this cognitive bias with AI and product management because it’s really easy to get an LLM to produce a lot of text and format it nicely.

In product we’re often starting from a hypothesis, a guess about what a customer needs and how they’re going to behave. We don’t have a whole lot of information. We may not have the highest confidence about our hypothesis, but it seems like something worth exploring.

Feed that to ChatGPT or Claude and ask for a document to share with your team, and you’ll get back something that is quite polished and well-formatted and persuasive that has lots of words in it. You hand that over to a developer, and their brain tells them, “This isn’t uncertain. This isn’t a hypothesis. Don’t ask any questions here, this is well thought out and complete. Proceed accordingly.”

And now we’re back to some of the problems that we had with big design up front in the old days. The illusion of knowledge preventing collaboration and experimentation.

What Are These Tools Best At?

At the moment, the strongest applications of LLMs for product management that we’re seeing are in three areas:

Helping people get their thoughts clear, as in the example I shared at the beginning of this article. Tools like ChatGPT can be trained to be reasonably good product coaches. And this can work at any level of detail.
Answering variations on “What’d I miss?” Share a description of a feature and how you’ve broken it down and ask the tool to tell you what other variations you might consider. Or ask the tool what untested assumptions you might be making. The scale of the language models makes them good at this kind of analysis.
Vibe-coding prototypes to get fast feedback on how users interact with a solution. Of course, we still have to be careful about issues like the detail bias described above and the tendency for prototypes to become the product or lead to over-optimistic estimating. But there’s potential now to run higher-fidelity customer tests that are as fast as paper prototypes.

I’m curious… What positive applications are you seeing in PM? How have you seen these and other pitfalls get in the way? Contact us or comment on social media and let us know.

Last updated August 27, 2025