While AI coding assistants have become mainstream, applying generative AI to infrastructure and operations is still emerging. The promise is huge—but so are the challenges, from hallucinated insights to navigating noisy telemetry.

In this episode of Semaphore Uncut, Gou Rao—AI infrastructure veteran and co-founder of Neubird—joins Darko to explore how LLMs can support DevOps teams not just with suggestions, but with reasoning, context gathering, and real-time incident diagnosis. He shares the vision behind Hawkeye, Neubird’s “digital engineer,” and what it means to build agentic systems that think and operate like humans.

A Career Rooted in Data and AI

Gou’s background in AI and enterprise systems goes back to his graduate work at the University of Pennsylvania. Over the years, he’s applied data science to solve tough infrastructure problems, most recently as part of the founding team at Portworx, a cloud-native storage platform.

At Neubird, Gou brings together his two longtime passions: data structures and AI. His team is now focused on building tools that use LLMs not just to generate content, but to extract operational insights from sprawling, high-volume telemetry systems.

From Stack Traces to Reasoning Engines

Most engineers today interact with LLMs in a straightforward way: paste in a stack trace, ask for help, and get a response. But Neubird’s thesis is different.

“The hard part isn’t solving the stack trace,” Gou says. “It’s finding it.”

Neubird’s product, Hawkeye, acts as a reasoning engine across the entire telemetry stack—logs, metrics, traces, configuration, alerts. Rather than using retrieval-augmented generation, it reverses the pattern: LLMs decide what needs to be found, and the system fetches it through native queries (like PromQL or Splunk searches). That separation improves both accuracy and auditability.

Building a Digital Engineer

The team designed Hawkeye to mirror how a human SRE would think: identify symptoms, investigate telemetry sources, reason about possible causes, and suggest remediations. Behind the scenes, different models (Claude, Mistral, Llama3) are orchestrated depending on task complexity.

Each inference takes a few minutes—not seconds—but that’s intentional. Speed isn’t the goal. Correctness is.

To build trust, Hawkeye shows its chain of thought as it works. For example, it might check a Prometheus metric, then search for recent config changes, then correlate that with a specific GitHub commit. While it doesn’t take action on its own yet, it can generate pull requests, annotate incident tickets, or suggest Terraform changes for review.

Telemetry-First, Not Model-First

A major design principle behind Hawkeye is accuracy. Gou explains that they never rely solely on model output. If a model says, “Check if CPU usage is high,” the system runs an actual Prometheus query to verify that.

That tight integration with telemetry also means Hawkeye adapts to your environment. During onboarding, teams provide read-only access to their observability tools and infrastructure. From there, Hawkeye builds context, including preferences, frequently failing clusters, and past incidents stored in systems like ServiceNow.

While global model training happens based on aggregated feedback, environment-specific memory—like your preferred verbosity or known flaky services—is stored separately in a vector database to shape interactions without retraining the model.

Scaling from Startups to Enterprises

Hawkeye is already proving useful across a wide customer base. One small company with a single SRE uses it to cut through noisy alerts and stay focused on critical issues. At the other end of the spectrum, large banks are integrating Hawkeye into high-stakes payment systems to augment their SRE teams and reduce stress and churn.

No matter the size, the goal is the same: reduce time spent sifting through logs and alerts, and make engineers more effective.

What About Autonomous Actions?

Right now, Hawkeye only recommends actions—it doesn’t apply changes directly. That might change, but only with time and trust.

“There’s an unpredictability in generative AI that makes people cautious,” Gou says. “We’re not ready to go full self-driving in production. But maybe we’ll start with safe environments—nightly builds, staging systems, and so on.”

Like Tesla’s autopilot, Gou believes the future will include more automation, but only after the systems prove themselves repeatedly in controlled scenarios.

Trying Out Hawkeye

Interested teams can try Hawkeye through Neubird’s website or directly via AWS and Azure marketplaces. A playground environment helps teams test key use cases before connecting real production data.

Once deployed, Hawkeye integrates seamlessly with observability platforms like Prometheus, Azure Monitor, or Splunk. Whether used in real-time triage or as an asynchronous assistant to incident response tools like PagerDuty or Jira, Hawkeye aims to be a thinking partner—not just another bot.

Final Thoughts: A Teammate, Not a Tool

Gou is clear-eyed about the limitations: Hawkeye isn’t magic. It doesn’t solve every incident or replace your ops team. But it augments engineers by handling the tedious groundwork—surfacing relevant signals, correlating issues, and providing well-reasoned suggestions.

“It’s there to take the grunt work off your plate,” Gou says. “So you can focus on the interesting stuff.”

And in a world where SREs are under constant pressure, that’s more than welcome.

Follow Gou Rao and Neubird

🌐 Website: https://neubird.ai
💼 LinkedIn: Gou Rao | Hawkeye

The post Gou Rao on Agentic Systems in DevOps appeared first on Semaphore.

Darko FabijanSource