DevOps Python - 搜索 News

23 小时

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real ...

Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry ...

这项由复旦大学、上海齐冀智风科技有限公司和上海创新研究院联合完成的研究发表于2026年1月，论文编号为arXiv:2601.11077v1。研究团队开发了名为ABC-Bench的全新评估基准，专门测试AI代码智能体在真实后端开发场景中的综合能力。

一些您可能无法访问的结果已被隐去。