The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks
By: Wanfu Gao, Zebin He, Jun Gao
Existing feature engineering methods based on large language models (LLMs) have not yet been applied to multi-label learning tasks. They lack the ability to model complex label dependencies and are not specifically adapted to the characteristics of multi-label tasks. To address the above issues, we propose Feature Engineering Automation for Multi-Label Learning (FEAML), an automated feature engineering method for multi-label classification which leverages the code generation capabilities of LLMs. By utilizing metadata and label co-occurrence matrices, LLMs are guided to understand the relationships between data features and task objectives, based on which high-quality features are generated. The newly generated features are evaluated in terms of model accuracy to assess their effectiveness, while Pearson correlation coefficients are used to detect redundancy. FEAML further incorporates the evaluation results as feedback to drive LLMs to continuously optimize code generation in subsequent iterations. By integrating LLMs with a feedback mechanism, FEAML realizes an efficient, interpretable and self-improving feature engineering paradigm. Empirical results on various multi-label datasets demonstrate that our FEAML outperforms other feature engineering methods.
Similar Papers
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
Machine Learning (CS)
Finds better data patterns for smarter computer predictions.
LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models
CV and Pattern Recognition
Teaches AI to understand medical pictures better.
Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents
Artificial Intelligence
Finds hidden patterns in data for discoveries.