Practical Considerations

layout: true

<footer>
	<span class="icon github">
	<svg version="1.1" class="github-icon-svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
	 viewBox="0 0 16 16" enable-background="new 0 0 16 16" xml:space="preserve">
	<path fill-rule="evenodd" clip-rule="evenodd" fill="#C2C2C2" d="M7.999,0.431c-4.285,0-7.76,3.474-7.76,7.761c0,3.428,2.223,6.337,5.307,7.363c0.388,0.071,0.53-0.168,0.53-0.374c0-0.184-0.007-0.672-0.01-1.32c-2.159,0.469-2.614-1.04-2.614-1.04c-0.353-0.896-0.862-1.135-0.862-1.135c-0.705-0.481,0.053-0.472,0.053-0.472c0.779,0.055,1.189,0.8,1.189,0.8c0.692,1.186,1.816,0.843,2.258,0.645c0.071-0.502,0.271-0.843,0.493-1.037C4.86,11.425,3.049,10.76,3.049,7.786c0-0.847,0.302-1.54,0.799-2.082C3.768,5.507,3.501,4.718,3.924,3.65c0,0,0.652-0.209,2.134,0.796C6.677,4.273,7.34,4.187,8,4.184c0.659,0.003,1.323,0.089,1.943,0.261c1.482-1.004,2.132-0.796,2.132-0.796c0.423,1.068,0.157,1.857,0.077,2.054c0.497,0.542,0.798,1.235,0.798,2.082c0,2.981-1.814,3.637-3.543,3.829c0.279,0.24,0.527,0.713,0.527,1.437c0,1.037-0.01,1.874-0.01,2.129c0,0.208,0.14,0.449,0.534,0.373c3.081-1.028,5.302-3.935,5.302-7.362C15.76,3.906,12.285,0.431,7.999,0.431z"/>
	</svg>
	</span>
	<a href="https://github.com/sikoried"><span class="username">sikoried</span></a>
</footer>

---

# Sequence Learning

## Practical Considerations

Korbinian Riedhammer

---

# Choice of Language

- R: the language of statisticians
	+ good file i/o for medium sized data
	+ statistical tests built-in
	+ ...a math scripting language
- python: the new Perl...
	+ incredible library choices
	+ numpy for numeric computation
	+ scipy/sklearn for basic ML
	+ pandas for data analysis/science tasks
	+ TF, Keras, ...

---

# Choice of Language

- c++: if performance matters
	+ actual implementation for most toolkits
	+ high-performance computing
	+ "real" programming language
- Scala: language of microservices
	+ functional language
	+ scalable frameworks (akka)
	+ full Java support

...dependent on your architectorial considerations!

---

# Data IO

## Surprisingly the hardest part...

Examples for python.

---

# Python Pickle

- Compare to basic Java serialization (JPA)
- Inefficient (slow, disk hungry)
- Security concern (serializes the classes)
- Easy to use :-)

```python
import pickle

data = {'some': 'key', 'value': 123}
pickle.dump(data, open('data.pkl', 'wb'))

data = pickle.load(open('data.pkl', 'rb'))
```

---

# Pandas

- Python data analytics toolkit ([tutorial](http://pandas.pydata.org/pandas-docs/version/0.15/10min.html), [indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html))
- `Series` and `DataFrame`

```python
import numpy as np
import pandas as pd

s = pd.Series([1, 3, 3, 7])
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
```

---

# Tensorflow TFRecord

- Uses [Google Protocol Buffers](https://developers.google.com/protocol-buffers/)
- Inefficient, particularly for sequence data
- Helps with partitioning and shuffling
- A [bit complicated to use](https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564)
- <https://www.skcript.com/svr/why-every-tensorflow-developer-should-know-about-tfrecord/>

---

# Weight Initialization

![weights](/sequence-learning/09-toolkits/weights.png)

---

# Weight Initialization: 0

- identical gradients
- effectively linear model

---

# Weight Inititalization: Random

Usually the best choice (seed for reproducibility!)

## Vanishing Gradient

- Gradient becomes very small (and thus numerically instable)
- Sigmoid and tanh functions prone to VG for large weights

## Exploding Gradient

- Opposite: gradient becomes too large for large weights and small activations
- Oscillating cost function

---

# Best Practices

<https://towardsdatascience.com/deep-learning-best-practices-1-weight-initialization-14e5c0295b94>

- Use ReLU: gradient is 0 for negative, 1 for positive examples
- Use heuristics for weight initialization
- Apply gradient clipping