# Hidden Markov Models

In this exercise, we’ll build an isolated word recognizer using Hidden Markov Models. We will use the following classes of the package com.github.sikoried.jstk.stat:

• Mixture: This will serve as the implementation for the emission densitites. There is a constructor that takes an InputStream, as well as a write method to save models to file.
• hmm.HMM: As the basic implementation of hidden Markov models. There is a constructor that takes an InputStream, as well as a write method to save models to file.
• hmm.SCState and hmm.CState as implementations of states with semi- and continuous emission probabilities.
• hmm.Alignment to compute and store HMM state alignments.

### Preliminaries

We’ll be using JSTK as well as our own code, so make sure to set up your CLASSPATH:

cd path/to/sl-examples
$(gradle -q env) # will exec export CLASSPATH=...  # Data Prep • Clone the Free Spoken Digit Dataset. • Create file lists for training and test; we’ll use the *_?.wav files for test, the *_??.wav files for train. • Compute MFCC features, using first derivatives and per-file normalization. • Train a Gaussian mixture model (128 diagonal densities), which we will be using as codebook later # clone data set git clone https://github.com/Jakobovski/free-spoken-digit-dataset.git cd free-spoken-digit-dataset/recordings # make lists /bin/ls [0-9]_{jackson,theo,nicolas,yweweler}_??.wav > list.train /bin/ls [0-9]_{jackson,theo,nicolas,yweweler}_?.wav > list.test cat list.{train,test} > list.all # compute features mkdir ft java com.github.sikoried.jstk.app.Mfcc \ -f t:wav/8 -w hamm,25,10 \ -b 0,4000,-1,24 -d 5:1 \ --turn-wise-mvn \ --in-list list.all ft # init and train GMM N=128 mkdir mdl java com.github.sikoried.jstk.app.Initializer \ --list list.train --dir ft \ --gmm mdl/init${N}.mdl -n $N -s g-ev java com.github.sikoried.jstk.app.GaussEM \ -i mdl/init${N}.mdl \
-o mdl/em\${N}.mdl \
-l list.train -d ft


# Training

To keep things simple, we’ll make a few assumptions:

• The classes (words) are the numbers 0 through 9.
• Filenames always follow the scheme {class}_{speaker}_{rec-id}.wav, ie. the first part is the class label.
• Feature files and models are stored in ft/ and mdl/, respectively (hard-code path names).

The overall training routine will be:

1. Align all examples using a linear alignment
2. Accumulate the statistics and re-estimate
3. Force-align the training data
4. Accumulate the statistics and re-estimate; optionally repeat this step
5. Repeat at 3.

Complete the binary iw.Trainer, that accepts property file, to be read via Commons Configuration, containing the following variables (defaults in parentheses):

• list that contains the training files
• number of states per class/HMM (4)
• classes (0, 1, 2, 3, 4, 5, 6, 7, 8, 9; retrieve with getStringArray); models will be stored as <name>.mdl[.<iter>]
• directory where to store the model files (mdl/)
• directory where to find the feature files (ft/)
• number of overall iterations (10)
• iterations when to re-align (1, 2, 5, 8; retrieve with getIntArray)
• CState: number of densities (1)
• SCState: codebook to use (null; note: copy will be written to model directory)

An example properties file could look like

iw.list = list.train
iw.states = 4
iw.classes = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
iw.mdldir = mdl-batch1
iw.ftdir = ft
iw.iterations = 10
iw.realign = 1, 2, 5, 8
iw.codebook = mdl/em10-128.mdl


### Implement the Training Routine

1. for each class, allocate a HMM; use SCState if codebook is specified, CState otherwise
2. compute an initial estimate, by creating linear alignments for each file and class; save the initial estimates as <name>.mdl.0
3. reset the accumulators
4. if (cur_iter in realign): compute the forced alignment
5. accumulate according to alignments (eg. accumulateVT)
6. re-estimate the parameters (.reestimate()), save current estimate
7. if (cur_iter < num_iters): goto 2

# Classification

Complete the binary iw.Classifier, which accepts a properties file with the following settings:

• file list containing the test data
• list of classes to load
• directory where to store the model files (mdl/)
• directory where to find the feature files (ft/)
• if required, the codebook (relative to model directory)
iw.list = list.test
iw.classes = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
iw.mdldir = mdl-batch1
iw.ftdir = ft
iw.codebook = em10-128.mdl


### Implement the Classification Routine

1. Load the model files (and codebook)
2. For each feature file, align each of the models
3. Normalize the scores to a soft-max (to get probabilities for each class)
4. Output lines of <file> <best-class> <class-scores ...>

### Evaluation

• How does your classifier perform?
• Run experiments with different settings (iterations, states, classes)
• Can you see patterns of classes that get mixed up?