Observability part 2: AI assistance
Builds on Part 1 by creating an AI agent that reviews workflow logs and summarizes failures automatically. Defines tools for querying Loki, retrieving workflow metadata, and feeding the results into an agent with clear troubleshooting instructions. Shows example runs where the agent highlights root causes and suggests next steps after a Nextflow job error. Reflects on how automation can accelerate incident response and where the approach could evolve next.