Analyst memo

Research1 sourceDeveloping

ITBench-AA Results: AI Models Underperform

The ITBench-AA benchmark reveals that leading AI models score below 50% on Site Reliability Engineering tasks, underscoring challenges for enterprise IT automation.

Published May 28, 2026, 4:16 AMUpdated May 28, 2026, 4:16 AM

What happened

Artificial Analysis and IBM released the ITBench-AA benchmark assessing AI models on SRE tasks, with leading models scoring below 50%.

Why it matters

The benchmark highlights the limitations of current AI models in handling complex IT tasks, impacting enterprise automation strategies.

Who is affected

Enterprises relying on AI for IT management may need to recalibrate expectations, given the current model performance limitations.

Risks / uncertainty

Model performance variability and cost considerations pose risks to widespread adoption of AI for IT tasks.