Score: 1

Using Small Language Models to Reverse-Engineer Machine Learning Pipelines Structures

Published: January 7, 2026 | arXiv ID: 2601.03988v1

By: Nicolas Lacroix , Mireille Blay-Fornarino , Sébastien Mosser and more

Potential Business Impact:

Helps understand how computer programs learn.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Background: Extracting the stages that structure Machine Learning (ML) pipelines from source code is key for gaining a deeper understanding of data science practices. However, the diversity caused by the constant evolution of the ML ecosystem (e.g., algorithms, libraries, datasets) makes this task challenging. Existing approaches either depend on non-scalable, manual labeling, or on ML classifiers that do not properly support the diversity of the domain. These limitations highlight the need for more flexible and reliable solutions. Objective: We evaluate whether Small Language Models (SLMs) can leverage their code understanding and classification abilities to address these limitations, and subsequently how they can advance our understanding of data science practices. Method: We conduct a confirmatory study based on two reference works selected for their relevance regarding current state-of-the-art's limitations. First, we compare several SLMs using Cochran's Q test. The best-performing model is then evaluated against the reference studies using two distinct McNemar's tests. We further analyze how variations in taxonomy definitions affect performance through an additional Cochran's Q test. Finally, a goodness-of-fit analysis is conducted using Pearson's chi-squared tests to compare our insights on data science practices with those from prior studies.

Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation

Computation and Language

Makes small AI understand and do many tasks.

26 May 2025 1

89%

Small Language Models Can Use Nuanced Reasoning For Health Science Research Classification: A Microbial-Oncogenesis Case Study

Computational Engineering, Finance, and Science

Helps AI quickly sort medical papers.

6 Dec 2025 0

89%

Investigating Language Model Capabilities to Represent and Process Formal Knowledge: A Preliminary Study to Assist Ontology Engineering

Artificial Intelligence

Helps small computers reason better with logic.

12 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Page Count

7 pages

Using Small Language Models to Reverse-Engineer Machine Learning Pipelines Structures

Helps understand how computer programs learn.

Technical Abstract

Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation

Small Language Models Can Use Nuanced Reasoning For Health Science Research Classification: A Microbial-Oncogenesis Case Study

Investigating Language Model Capabilities to Represent and Process Formal Knowledge: A Preliminary Study to Assist Ontology Engineering