Search

Home > Machine Learning Guide > 19. Natural Language Processing 2
Podcast: Machine Learning Guide
Episode:

19. Natural Language Processing 2

Category: Technology
Duration: 01:05:33
Publish Date: 2017-07-10 20:40:58
Description:

Natural Language Processing classical/shallow algorithms.

## Resources - Speech and Language Processing (http://amzn.to/2uZaNyg) `book:hard` comprehensive classical-NLP bible - Stanford NLP YouTube (https://www.youtube.com/playlist?list=PL6397E4B26D00A269) `course|audio:medium` - NLTK Book (http://www.nltk.org/book) `book:medium` - Convert video to audio: ** mp4 => mp3: `for f in *.mp4; do ffmpeg -i "$f" "${f%.mp4}.mp3" && rm "$f"; done` ** youtube => mp3: setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x youtube.com/playlist?list=`

## Episode

- Edit distance: Levenshtein distance - Stemming/lemmatization: Porter Stemmer - N-grams, Tokens: regex - Language models ** Machine translation, spelling correction, speech recognition - Classification / Sentiment Analysis: SVM, Navie bayes - Information Extraction (POS, NER): Models: MaxEnt, Hidden Markov Models (HMM), Conditional Random Fields (CRF) - Generative vs Discriminitive models ** Generative: HMM, Bayes, LDA ** Discriminative: SVMs, MaxEnt / LogReg, ANNs ** Pros/Cons ** Generative depends on fewer data (NLP tends to be few data) ** MaxEnt vs Naive Bayes: Independence assumption of Bayes, etc ("Hong" "Kong") - Topic Modeling and keyword extraction: Latent Dirichlet Allocation (LDA) ** LDA ~= LSA ~= LSI: Latent diriclet allocation, latent semantic indexing, latent semantic analysis - Search / relevance / document-similarity: Bag-of-words, TF-IDF - Similarity: Jaccard, Cosine, Euclidean

Total Play: 0

Users also like

200+ Episodes
Data Science .. 300+     20+
300+ Episodes
Revolutions 2K+     50+
2 Episodes
Anxiety & De .. 20+    
100+ Episodes
Fisicast 800+     60+