layout: true --- # Sequence Learning ## Practical Considerations Korbinian Riedhammer --- # Choice of Language - R: the language of statisticians + good file i/o for medium sized data + statistical tests built-in + ...a math scripting language - python: the new Perl... + incredible library choices + numpy for numeric computation + scipy/sklearn for basic ML + pandas for data analysis/science tasks + TF, Keras, ... --- # Choice of Language - c++: if performance matters + actual implementation for most toolkits + high-performance computing + "real" programming language - Scala: language of microservices + functional language + scalable frameworks (akka) + full Java support ...dependent on your architectorial considerations! --- # Data IO ## Surprisingly the hardest part... Examples for python. --- # Python Pickle - Compare to basic Java serialization (JPA) - Inefficient (slow, disk hungry) - Security concern (serializes the classes) - Easy to use :-) ```python import pickle data = {'some': 'key', 'value': 123} pickle.dump(data, open('data.pkl', 'wb')) data = pickle.load(open('data.pkl', 'rb')) ``` --- # Pandas - Python data analytics toolkit ([tutorial](http://pandas.pydata.org/pandas-docs/version/0.15/10min.html), [indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html)) - `Series` and `DataFrame` ```python import numpy as np import pandas as pd s = pd.Series([1, 3, 3, 7]) df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) ``` --- # Tensorflow TFRecord - Uses [Google Protocol Buffers](https://developers.google.com/protocol-buffers/) - Inefficient, particularly for sequence data - Helps with partitioning and shuffling - A [bit complicated to use](https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564) -
--- # Weight Initialization ![weights](/sequence-learning/09-toolkits/weights.png) --- # Weight Initialization: 0 - identical gradients - effectively linear model --- # Weight Inititalization: Random Usually the best choice (seed for reproducibility!) ## Vanishing Gradient - Gradient becomes very small (and thus numerically instable) - Sigmoid and tanh functions prone to VG for large weights ## Exploding Gradient - Opposite: gradient becomes too large for large weights and small activations - Oscillating cost function --- # Best Practices
- Use ReLU: gradient is 0 for negative, 1 for positive examples - Use heuristics for weight initialization - Apply gradient clipping