Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup Augmentation
By: Shuyang Liu , Yuan Jin , Rui Lin and more
Potential Business Impact:
Makes music sound better by learning what people like.
Evaluating the aesthetic quality of generated songs is challenging due to the multi-dimensional nature of musical perception. We propose a robust music aesthetic evaluation framework that combines (1) multi-source multi-scale feature extraction to obtain complementary segment- and track-level representations, (2) a hierarchical audio augmentation strategy to enrich training data, and (3) a hybrid training objective that integrates regression and ranking losses for accurate scoring and reliable top-song identification. Experiments on the ICASSP 2026 SongEval benchmark demonstrate that our approach consistently outperforms baseline methods across correlation and top-tier metrics.
Similar Papers
A Survey on Evaluation Metrics for Music Generation
Sound
Helps judge if computer-made music sounds good.
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Sound
Creates music from pictures and words.
Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
Audio and Speech Processing
Rates how good computer-made sounds are.