layout: true --- # Sequence Learning ## Deployment Korbinian Riedhammer --- # Brainstorming ### What are key aspects for deploying ML models? --- # Key Questions - Responsiveness and real-time performance + Critical: user interactions + Flexible: batch processing - Scalability + Cost structure + Startup times + Sensitivity --- # Architectural Choices - "Thin Client": embedded in web app + Java SpringBoot + python Flask, etc.) + Scala Scalatra + ... - Microservices - Instances (deployment) vs. jobs (scheduling) - Scalability + Kubernetes + Docker Swarm + Queueing systems (slurm, sungrid, ...) --- # Ressource Constraints - CPU or GPU, and how many - RAM, shared memory? - Scratch disk space - Image sizes? + Binaries + Model files - Deployment vs. jobs + Startup times? + Pre-warming? + Model caching? --- # Versioning - Consistency! - Binary files (eg. cuda drivers, utilities) - Script files (eg. TF workflows, python "glue") - Model files (eg. nightly/monthly/recurrent builds) - Customer-specific models