ITBench-AA Results: AI Models Underperform

The ITBench-AA benchmark reveals that leading AI models score below 50% on Site Reliability Engineering tasks, underscoring challenges for enterprise IT automation.

Published May 28, 2026, 4:16 AMUpdated May 28, 2026, 4:16 AM

What happened

Artificial Analysis and IBM released the ITBench-AA benchmark assessing AI models on SRE tasks, with leading models scoring below 50%.

[1]

Why it matters

The benchmark highlights the limitations of current AI models in handling complex IT tasks, impacting enterprise automation strategies.

[1]

Who is affected

Enterprises relying on AI for IT management may need to recalibrate expectations, given the current model performance limitations.

[1]

Risks / uncertainty

Model performance variability and cost considerations pose risks to widespread adoption of AI for IT tasks.

[1]