SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
By: Wuxinlin Cheng , Yupeng Cao , Jinwen Wu and more
Potential Business Impact:
Makes smart computer words more trustworthy.
Recent strides in pretrained transformer-based language models have propelled state-of-the-art performance in numerous NLP tasks. Yet, as these models grow in size and deployment, their robustness under input perturbations becomes an increasingly urgent question. Existing robustness methods often diverge between small-parameter and large-scale models (LLMs), and they typically rely on labor-intensive, sample-specific adversarial designs. In this paper, we propose a unified, local (sample-level) robustness framework (SALMAN) that evaluates model stability without modifying internal parameters or resorting to complex perturbation heuristics. Central to our approach is a novel Distance Mapping Distortion (DMD) measure, which ranks each sample's susceptibility by comparing input-to-output distance mappings in a near-linear complexity manner. By demonstrating significant gains in attack efficiency and robust training, we position our framework as a practical, model-agnostic tool for advancing the reliability of transformer-based NLP systems.
Similar Papers
Statistical Hypothesis Testing for Auditing Robustness in Language Models
Computation and Language
Checks if AI answers change when you change its input.
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Cryptography and Security
Finds hidden tricks that can fool AI.
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Computation and Language
Makes AI smarter and more reliable.