|
In episode 89 of The Gradient Podcast, Daniel Bashir speaks to Shreya Shankar. Shreya is a computer scientist pursuing her PhD in databases at UC Berkeley. Her research interest is in building end-to-end systems for people to develop production-grade machine learning applications. She was previously the first ML engineer at Viaduct, did research at Google Brain, and software engineering at Facebook. She graduated from Stanford with a B.S. and M.S. in computer science with concentrations in systems and artificial intelligence. At Stanford, helped run SHE++, an organization that helps empower underrepresented minorities in technology. Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter Outline: * (00:00) Intro * (02:22) Shreya’s background and journey into ML / MLOps * (04:51) ML advances in 2013-2016 * (05:45) Shift in Stanford undergrad class ecosystems, accessibility of deep learning research * (09:10) Why Shreya left her job as an ML engineer * (13:30) How Shreya became interested in databases, data quality in ML * (14:50) Daniel complains about things * (16:00) What makes ML engineering uniquely difficult * (16:50) Being a “historian of the craft” of ML engineering * (22:25) Levels of abstraction, what ML engineers do/don’t have to think about * (24:16) Observability for Production ML Pipelines * (28:30) Metrics for real-time ML systems * (31:20) Proposed solutions * (34:00) Moving Fast with Broken Data * (34:25) Existing data validation measures and where they fall short * (36:31) Partition summarization for data validation * (38:30) Small data and quantitative statistics for data cleaning * (40:25) Streaming ML Evaluation * (40:45) What makes a metric actionable * (42:15) Differences in streaming ML vs. batch ML * (45:45) Delayed and incomplete labels * (49:23) Operationalizing Machine Learning * (49:55) The difficult life of an ML engineer * (53:00) Best practices, tools, pain points * (55:56) Pitfalls in current MLOps tools * (1:00:30) LLMOps / FMOps * (1:07:10) Thoughts on ML Engineering, MLE through the lens of data engineering * (1:10:42) Building products, user expectations for AI products * (1:15:50) Outro Links: * Papers * Towards Observability for Production Machine Learning Pipelines * Rethinking Streaming ML Evaluation * Operationalizing Machine Learning * Moving Fast With Broken Data * Blog posts * The Modern ML Monitoring Mess * Thoughts on ML Engineering After a Year of my PhD
Get full access to The Gradient at thegradientpub.substack.com/subscribe |