Introduction to Natural Language Processing (NLP) topics. ## Resources - Speech and Language Processing (http://amzn.to/2uZaNyg) `book:hard` comprehensive classical-NLP bible - Stanford NLP YouTube (https://www.youtube.com/playlist?list=PL6397E4B26D00A269) `course|audio:medium` - NLTK Book (http://www.nltk.org/book) `book:medium` - Convert video to audio: ** mp4 => mp3: `for f in *.mp4; do ffmpeg -i "$f" "${f%.mp4}.mp3" && rm "$f"; done` ** youtube => mp3: setup youtube-dl (https://github.com/rg3/youtube-dl) and run `youtube-dl -x youtube.com/playlist?list=` ## Errata 22:21 "cat & car different by one word" should be "different by one letter" ## Episode Syntax vs Semantics Parts - Corpus - Lexicon - Morphology ** Lemmas & Stems (reduce morphological variation; lemmatization more sophisticated) ** Tokens ** Stop words ** Edit-distance ** Word sense disambiguation Syntax / Tasks - Info Extraction (POS, NER, Relationship extraction) - Parsing Goals - Spell check - Classification ** Tagging (topic modeling / keyword extraction) ** Sentiment analysis - Search / relevance, document similarity - Natural language understanding ** Question answering ** Textual entailment ** Machine Translation (AI-complete) ** NLU vs NLP - Natural language generation ** Image captioning ** Chatbots ** Automatic summarization - Won't cover ** Optical character recognition (OCR) ** Speech (TTS, STT, Segmentation, Diarization) |