Using Context to Improve Word Segmentation
By: Stephanie Hu, Xiaolu Guo
Potential Business Impact:
Helps babies learn words by listening to patterns.
An important step in understanding how children acquire languages is studying how infants learn word segmentation. It has been established in previous research that infants may use statistical regularities in speech to learn word segmentation. The research of Goldwater et al., demonstrated that incorporating context in models improves their ability to learn word segmentation. We implemented two of their models, a unigram and bigram model, to examine how context can improve statistical word segmentation. The results are consistent with our hypothesis that the bigram model outperforms the unigram model at predicting word segmentation. Extending the work of Goldwater et al., we also explored basic ways to model how young children might use previously learned words to segment new utterances.
Similar Papers
BabyLM's First Words: Word Segmentation as a Phonological Probing Task
Computation and Language
Teaches computers to understand word sounds in many languages.
Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models
Computation and Language
Helps computers understand words in any language.
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
Computation and Language
Makes talking computers understand better, faster.