How Small Transformation Expose the Weakness of Semantic Similarity Measures
By: Serge Lionel Nikiema , Albérick Euraste Djire , Abdoul Aziz Bonkoungou and more
Potential Business Impact:
Finds computer code that means the same thing.
This research examines how well different methods measure semantic similarity, which is important for various software engineering applications such as code search, API recommendations, automated code reviews, and refactoring tools. While large language models are increasingly used for these similarity assessments, questions remain about whether they truly understand semantic relationships or merely recognize surface patterns. The study tested 18 different similarity measurement approaches, including word-based methods, embedding techniques, LLM-based systems, and structure-aware algorithms. The researchers created a systematic testing framework that applies controlled changes to text and code to evaluate how well each method handles different types of semantic relationships. The results revealed significant issues with commonly used metrics. Some embedding-based methods incorrectly identified semantic opposites as similar up to 99.9 percent of the time, while certain transformer-based approaches occasionally rated opposite meanings as more similar than synonymous ones. The study found that embedding methods' poor performance often stemmed from how they calculate distances; switching from Euclidean distance to cosine similarity improved results by 24 to 66 percent. LLM-based approaches performed better at distinguishing semantic differences, producing low similarity scores (0.00 to 0.29) for genuinely different meanings, compared to embedding methods that incorrectly assigned high scores (0.82 to 0.99) to dissimilar content.
Similar Papers
Semantics at an Angle: When Cosine Similarity Works Until It Doesn't
Machine Learning (CS)
Makes computer learning understand words better.
Beyond Surface Similarity: Evaluating LLM-Based Test Refactorings with Structural and Semantic Awareness
Software Engineering
Measures how well AI improves computer code.
An Exploratory Analysis on the Explanatory Potential of Embedding-Based Measures of Semantic Transparency for Malay Word Recognition
Computation and Language
Helps computers understand word meanings better.