Part 1: Comparing Toolkits
Here’s a list of currently “hot” machine learning toolkits:
Exercise
Pick two toolkits.
- What are pros and cons?
- Which one would you choose, and why?
- Work out a basic example that gets you started: admission.asc is a tiny classification task, the last column is the label
For your reference, Wikipedia maintains a comparison of deep learning software, and there’s a recent arxiv article that compares numerous toolkits.
Part 2: Practical Considerations
See the slide deck.
Exercise
For sequence to sequence learning, it is common practice to use training examples that have a fixed number of output frames, with variable-but-similar number of input frames. For an example problem, you’ve been provided the following alignments to be used to partition your data:
# <file-id> [<symbol> <length> (; <symbol> <length>)*]
AAA_m159dxx0_010_AAA 1 5 ; 23 13 ; 35 5 ; 9 4 ; 41 7 ; 32 6 ; 35 6 ; 17 10 ; 40 11 ; 1 3
Where symbol
is an integer ID of the output symbol, and length
is the number of input samples corresponding to this segment.
- Write a function that reads in the alignments file and outputs a list of examples, similar to the lines below
- Files with less output symbols than the desired length should be ignored
# <example-id> <start-input-index> <end-input-index> <output labels...>
AAA_m159dxx0_018_AAA_0042 368 422 40 12 10 9 14 10 24 14 42 9
def make_examples(file, num_outputs=8, stride=1):
"""
Given per-file alignments, compute output labels and corresponding
input frame indices for the given parameters:
file: file pointer to read alignments from
num_outputs: how many output labels per example
stride: how many output frames should the window be advanced
returns: list of tuples (egs-id, input-start, input-end, [labels...])
"""
# ...
Solution: examples.py
Part 3: Deploying Machine Learning Models
See the slide deck.