Scale AI
Corporate β’ πΊπΈ United States
Big TechPapers (L12M)
12
Researchers (β)
14
Papers w/ Code
2
Papers w/ Dataset
2
Topic Overview
Bubble chart placeholder
Recent Papers (see all )
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
Code
Computers and Society
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
Code
Data
Computation and Language
Online Rubrics Elicitation from Pairwise Comparisons
Computation and Language
TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large
Language Models
Machine Learning (CS)
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering
Tasks?
Code
Data
Software Engineering