Scale AI
Corporate β’ πΊπΈ United States
Big TechPapers (L12M)
9
Researchers (β)
15
Papers w/ Code
2
Papers w/ Dataset
2
Topic Overview
Bubble chart placeholder
Recent Papers (see all )
Agentic Rubrics as Contextual Verifiers for SWE Agents
Code
Machine Learning (CS)
Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Code
Sound
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
Code
Computers and Society
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
Code
Data
Computation and Language
Online Rubrics Elicitation from Pairwise Comparisons
Computation and Language