Score: 1

SafeBench-Seq: A Homology-Clustered, CPU-Only Baseline for Protein Hazard Screening with Physicochemical/Composition Features and Cluster-Aware Confidence Intervals

Published: December 19, 2025 | arXiv ID: 2512.17527v1

By: Muhammad Haris Khan

Potential Business Impact:

Tests if new proteins are safe to make.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Foundation models for protein design raise concrete biosecurity risks, yet the community lacks a simple, reproducible baseline for sequence-level hazard screening that is explicitly evaluated under homology control and runs on commodity CPUs. We introduce SafeBench-Seq, a metadata-only, reproducible benchmark and baseline classifier built entirely from public data (SafeProtein hazards and UniProt benigns) and interpretable features (global physicochemical descriptors and amino-acid composition). To approximate "never-before-seen" threats, we homology-cluster the combined dataset at <=40% identity and perform cluster-level holdouts (no cluster overlap between train/test). We report discrimination (AUROC/AUPRC) and screening-operating points (TPR@1% FPR; FPR@95% TPR) with 95% bootstrap confidence intervals (n=200), and we provide calibrated probabilities via CalibratedClassifierCV (isotonic for Logistic Regression / Random Forest; Platt sigmoid for Linear SVM). We quantify probability quality using Brier score, Expected Calibration Error (ECE; 15 bins), and reliability diagrams. Shortcut susceptibility is probed via composition-preserving residue shuffles and length-/composition-only ablations. Empirically, random splits substantially overestimate robustness relative to homology-clustered evaluation; calibrated linear models exhibit comparatively good calibration, while tree ensembles retain slightly higher Brier/ECE. SafeBench-Seq is CPU-only, reproducible, and releases metadata only (accessions, cluster IDs, split labels), enabling rigorous evaluation without distributing hazardous sequences.

SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models

Machine Learning (CS)

Finds dangerous fake proteins before they're made.

3 Sep 2025 4

85%

scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing

Genomics

Finds different cell types in your body.

2 Dec 2025 1

85%

NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction

Genomics

Helps predict how DNA changes affect living things.

4 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇰 Denmark

Repos / Data Links

github.com

Page Count

8 pages

SafeBench-Seq: A Homology-Clustered, CPU-Only Baseline for Protein Hazard Screening with Physicochemical/Composition Features and Cluster-Aware Confidence Intervals

Tests if new proteins are safe to make.

Technical Abstract

SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models

scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing

NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction