Sequence Learning
Elective for CS grad students at the Technical University of Applied Sciences Nuremberg.
Class Schedule and Credits
Time and Location: Mondays at 9.45, HQ.104
Announcements and Discussions: Moodle course 5312.
Format:
Each week, we will discuss algorithms and their theory before implementing them to get a better handson understanding. Java is suggested, pairprogramming encouraged, BYOD strongly recommended!
Credits:
We’ll adopt a common research routine: identify a problem, research prior work, engineer a solution, write it up in a paper, review other papers, present your work. Credits are earned through
 your 6 page paper submitted by June 24 (60%)
 reviewing 3 other papers by July 1 (20%)
 presenting your work on July 8 (tentative date). (20%)
For more details see these slides.
Note: Materials will be in English, the lectures/tutorials will be taught in German; class project in language of choice.
Recommended Textbooks
 Niemann, H: Klassifikation von Mustern. 2. Überarbeitete Auflage, 2003 (available online)
 Huang, Acero, Hon: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. (ISBN13: 9780130226167)
 Jurafsky, D and Martin, J: Speech and Language Processing. 2017 (available online)
 Manning, C, Raghavan P and Schütze, H: Introduction to Information Retrieval, Cambridge University Press. 2008. (available online)
 Goodfellow, I and Bengio,Y and Courville, A: Deep Learning. 2016 (available online)
Syllabus

March 18: Introduction. (slides, exercise)
We’ll start with the general concepts of supervised vs. unsupervised learning and classification of independent observations vs. sequences of observations. To get you motivated, we’ll look at a list of recent “AI products” that utilize sequence learning.

March 25: AutoCorrect. (slides by Ben Langmead, exercise)
We’ll start with a classic implementation of autocorrecting mispelled words to bring dynamic programming back to memory. We’ll also look at scalability regarding computation and memory efforts.

April 1: States and Cost Functions. (slides, exercise)
Understand how DP can be used on an abstraction of distances and states. We’ll build a smarter, keyboard layout aware autocorrect and start looking into some applications in signal processing (isolated word and DTMF sequence classification).

April 8: Modeling Sequences. (slides, exercise)
Learn about ngrams, a simple yet effective approach to learn contexts of distcrete symbols. We’ll use ngrams to improve our autocorrect by incorporating context and suggesting following words.

April 15: Hidden Markov Models. (slides, exercise)
We’ll take a close look at hidden Markov models and how to (efficiently) evaluate and train them. The Viterbi decoding algorithm tells us the most likely sequence and the path that lead to it. We’ll use them to build a proofofconcept isolated word recognizer.

April 22: no class (Easter)

April 29: HigherLevel Sequence Modeling with HMM. (slides, exercise)
Learn how to model complex sequences of arbitrary length that prohibit explicit modeling, such as speech recognition or choreographies in sports. Here we will combine what we’ve discussed so far: prefix trees, ngram models and efficient search.

May 6: FeedForward Neural Networks. (slides perceptron and nnets, fizzbuzz.py, exercise).
A brief introduction to neural networks: fundamentals, topologies and training. We’ll skip implementing the details and use tensorflow for the examples. Did you know that you could program fizzbuzz as a neural network? Please have Python with Numpy and TensorFlow installed and operational on your machine!

May 13: Recurrent Neural Networks. (slides cs231n: RNNs, exercise)
Recurrent neural networks use feedback loops to introduce temporal context or “memory” into the network. We’ll study them using two examples: language modeling and drawing classification.

May 20: Project Proposals
No class; teams will meet individually with instructor to discuss their project proposals. Plan for 1015 minutes discussion and bring 5 slides: Team, Task and a brief summary of 3 pieces of related work (textbook, paper, github). Book your time slot: https://terminplaner4.dfn.de/SfHJAnhklDDhiWhB

May 27: Sequence to Sequence Learning. (slides, literature and exercise)
Previous algorithms explicitly modeled the sequence, either as a graphlike structure such as an HMM or by concatenating observations to a single data point. Encoderdecoder networks are a special kind of topology of recurrent neural networks that can be used to model sequence to sequence mappings, such as found in endtoend speech recognition, machine translation or automatic summarization – without explicitly modeling states! We’ll also talk about the concept of attention, which allows the networks to learn an even better understanding of the context.

June 3: Project CheckIn
No class; teams will meet individually with instructor to discuss their projects. Plan for 1015 minutes to talk about your related work, implementation and performance of the chosen baseline, and a rough outline (bullet points) of the method and experiments sections. Book your time slot: https://terminplaner4.dfn.de/coDapAxvNNILyKkg

June 10: no class (Whit Monday)

June 17: SVM, Sequence Kernels and Embeddings (slides, SVM slides, seq. kernels, assignment)
In many cases, classifying a sequence into a discrete class does not quite work with recurrent networks. We’ll learn about support vector machines, sequence kernels and methods to map sequences into a single observation of a continuous space. Embeddings are learned feature representations that can incorporate large quantities of unlabeled data.

June 24: Papers due! Peer Reviews, ML in Production
We’ll go through the process of reviewing and presenting a (scientific) paper. We’ll then look at architectural challenges when training and deploying machine learning models for production.

July 1: Reviews due! Project Feedback and Presentation CheckIn
No class; teams will meet individually with instructor to get feedback on their paper and discuss the outline of their presentations. Plan for 1015 minutes total, and bring a rough outline of your presentation. Book your time slot: https://terminplaner4.dfn.de/lvG45ebLuBSFG1u4

July 8: Present your work!
Subscribe to https://github.com/sikoried/sequencelearning/ repository to follow updates.