Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
By: Lee Kezar, Zed Sehyr, Jesse Thomason
Potential Business Impact:
Helps computers understand new sign language words.
Sign language datasets are often not representative in terms of vocabulary, underscoring the need for models that generalize to unseen signs. Vector quantization is a promising approach for learning discrete, token-like representations, but it has not been evaluated whether the learned units capture spurious correlations that hinder out-of-vocabulary performance. This work investigates two phonological inductive biases: Parameter Disentanglement, an architectural bias, and Phonological Semi-Supervision, a regularization technique, to improve isolated sign recognition of known signs and reconstruction quality of unseen signs with a vector-quantized autoencoder. The primary finding is that the learned representations from the proposed model are more effective for one-shot reconstruction of unseen signs and more discriminative for sign identification compared to a controlled baseline. This work provides a quantitative analysis of how explicit, linguistically-motivated biases can improve the generalization of learned representations of sign language.
Similar Papers
Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR
Computation and Language
Cleans noisy speech for better understanding.
Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning
Machine Learning (CS)
Helps computers understand complex data better.
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
CV and Pattern Recognition
Lets computers understand and create images.