Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 / Jacob Perkins ; cover image by Faiz Fattohi.

Author
Perkins, Jacob [Browse]
Format
Book
Language
English
Εdition
Second edition.
Published/​Created
  • Birmingham, England : Packt Publishing Ltd, 2014.
  • ©2014
Description
1 online resource (304 p.)

Details

Subject(s)
Cover designer
Summary note
Over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 In Detail This book will show you the essential techniques of text and language processing. Starting with tokenization, stemming, and the WordNet dictionary, you'll progress to part-of-speech tagging, phrase chunking, and named entity recognition. You'll learn how various text corpora are organized, as well as how to create your own custom corpus. Then, you'll move onto text classification with a focus on sentiment analysis. And because NLP can be computationally expensive on large bodies of text, you'll try a few methods for distributed text processing. Finally, you'll be introduced to a number of other small but complementary Python libraries for text analysis, cleaning, and parsing. This cookbook provides simple, straightforward examples so you can quickly learn text processing with Python and NLTK. What You Will Learn Tokenize text into sentences, and sentences into words Look up words in the WordNet dictionary Apply spelling correction and word replacement Access the built-in text corpora and create your own custom corpus Tag words with parts of speech Chunk phrases and recognize named entities Grammatically transform phrases and chunks Classify text and perform sentiment analysis
Notes
  • "Quick answers to common problems"--Cover.
  • Includes index.
Source of description
Description based on online resource; title from PDF title page (ebrary, viewed September 2, 2014).
Language note
English
Contents
  • Intro
  • Python 3 Text Processing with NLTK 3 Cookbook
  • Table of Contents
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Support files, eBooks, discount offers, and more
  • Why Subscribe?
  • Free Access for Packt account holders
  • Preface
  • What this book covers
  • What you need for this book
  • Who this book is for
  • Conventions
  • Reader feedback
  • Customer support
  • Downloading the example code
  • Errata
  • Piracy
  • Questions
  • 1. Tokenizing Text and WordNet Basics
  • Introduction
  • Tokenizing text into sentences
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Tokenizing sentences in other languages
  • See also
  • Tokenizing sentences into words
  • Separating contractions
  • PunktWordTokenizer
  • WordPunctTokenizer
  • Tokenizing sentences using regular expressions
  • Simple whitespace tokenizer
  • Training a sentence tokenizer
  • Filtering stopwords in a tokenized sentence
  • Looking up Synsets for a word in WordNet
  • Working with hypernyms
  • Part of speech (POS)
  • Looking up lemmas and synonyms in WordNet
  • All possible synonyms
  • Antonyms
  • Calculating WordNet Synset similarity
  • Comparing verbs
  • Path and Leacock Chordorow (LCH) similarity
  • Discovering word collocations.
  • Scoring functions
  • Scoring ngrams
  • 2. Replacing and Correcting Words
  • Stemming words
  • The LancasterStemmer class
  • The RegexpStemmer class
  • The SnowballStemmer class
  • Lemmatizing words with WordNet
  • Combining stemming with lemmatization
  • Replacing words matching regular expressions
  • Replacement before tokenization
  • Removing repeating characters
  • Spelling correction with Enchant
  • The en_GB dictionary
  • Personal word lists
  • Replacing synonyms
  • CSV synonym replacement
  • YAML synonym replacement
  • Replacing negations with antonyms
  • 3. Creating Custom Corpora
  • Setting up a custom corpus
  • Loading a YAML file
  • Creating a wordlist corpus
  • Names wordlist corpus
  • English words corpus
  • Creating a part-of-speech tagged word corpus
  • Customizing the word tokenizer
  • Customizing the sentence tokenizer
  • Customizing the paragraph block reader
  • Customizing the tag separator.
  • Converting tags to a universal tagset
  • Creating a chunked phrase corpus
  • Tree leaves
  • Treebank chunk corpus
  • CoNLL2000 corpus
  • Creating a categorized text corpus
  • Category file
  • Categorized tagged corpus reader
  • Categorized corpora
  • Creating a categorized chunk corpus reader
  • Categorized CoNLL chunk corpus reader
  • Lazy corpus loading
  • Creating a custom corpus view
  • Block reader functions
  • Pickle corpus view
  • Concatenated corpus view
  • Creating a MongoDB-backed corpus reader
  • Corpus editing with file locking
  • 4. Part-of-speech Tagging
  • Default tagging
  • Evaluating accuracy
  • Tagging sentences
  • Untagging a tagged sentence
  • Training a unigram part-of-speech tagger
  • Overriding the context model
  • Minimum frequency cutoff
  • Combining taggers with backoff tagging
  • Saving and loading a trained tagger with pickle
  • Training and combining ngram taggers
  • Quadgram tagger
  • Creating a model of likely word tags
  • How it works.
  • Tagging with regular expressions
  • Affix tagging
  • Working with min_stem_length
  • Training a Brill tagger
  • Tracing
  • Training the TnT tagger
  • Controlling the beam search
  • Significance of capitalization
  • Using WordNet for tagging
  • Tagging proper names
  • Classifier-based tagging
  • Detecting features with a custom feature detector
  • Setting a cutoff probability
  • Using a pre-trained classifier
  • Training a tagger with NLTK-Trainer
  • Saving a pickled tagger
  • Training on a custom corpus
  • Training with universal tags
  • Analyzing a tagger against a tagged corpus
  • Analyzing a tagged corpus
  • 5. Extracting Chunks
  • Chunking and chinking with regular expressions
  • Parsing different chunk types
  • Parsing alternative patterns
  • Chunk rule with context
  • Merging and splitting chunks with regular expressions
  • Specifying rule descriptions
  • Expanding and removing chunks with regular expressions
  • Partial parsing with regular expressions
  • The ChunkScore metrics.
  • Looping and tracing chunk rules
  • Training a tagger-based chunker
  • Using different taggers
  • Classification-based chunking
  • Using a different classifier builder
  • Extracting named entities
  • Binary named entity extraction
  • Extracting proper noun chunks
  • Extracting location chunks
  • Training a named entity chunker
  • Training a chunker with NLTK-Trainer
  • Saving a pickled chunker
  • Training on parse trees
  • Analyzing a chunker against a chunked corpus
  • Analyzing a chunked corpus
  • 6. Transforming Chunks and Trees
  • Filtering insignificant words from a sentence
  • Correcting verb forms
  • Swapping verb phrases
  • Swapping noun cardinals
  • Swapping infinitive phrases
  • Singularizing plural nouns
  • Chaining chunk transformations
  • Converting a chunk tree to text
  • There's more.
ISBN
  • 9781782167860
  • 1782167862
OCLC
  • 891381366
  • 891786402
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information