Start over

Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 / Jacob Perkins ; cover image by Faiz Fattohi.

Author

Perkins, Jacob [Browse]

Format

Book

Language

English

Εdition

Second edition.

Published/Created

Birmingham, England : Packt Publishing Ltd, 2014.
©2014

Description

1 online resource (304 p.)

Available Online

O'Reilly Online Learning: Academic/Public Library Edition

Details

Subject(s)

Cover designer

Fattohi, Faiz [Browse]

Summary note

Over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 In Detail This book will show you the essential techniques of text and language processing. Starting with tokenization, stemming, and the WordNet dictionary, you'll progress to part-of-speech tagging, phrase chunking, and named entity recognition. You'll learn how various text corpora are organized, as well as how to create your own custom corpus. Then, you'll move onto text classification with a focus on sentiment analysis. And because NLP can be computationally expensive on large bodies of text, you'll try a few methods for distributed text processing. Finally, you'll be introduced to a number of other small but complementary Python libraries for text analysis, cleaning, and parsing. This cookbook provides simple, straightforward examples so you can quickly learn text processing with Python and NLTK. What You Will Learn Tokenize text into sentences, and sentences into words Look up words in the WordNet dictionary Apply spelling correction and word replacement Access the built-in text corpora and create your own custom corpus Tag words with parts of speech Chunk phrases and recognize named entities Grammatically transform phrases and chunks Classify text and perform sentiment analysis

Notes

"Quick answers to common problems"--Cover.
Includes index.

Source of description

Description based on online resource; title from PDF title page (ebrary, viewed September 2, 2014).

Language note

English

Contents

Intro
Python 3 Text Processing with NLTK 3 Cookbook
Table of Contents
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Tokenizing sentences in other languages
See also
Tokenizing sentences into words
Separating contractions
PunktWordTokenizer
WordPunctTokenizer
Tokenizing sentences using regular expressions
Simple whitespace tokenizer
Training a sentence tokenizer
Filtering stopwords in a tokenized sentence
Looking up Synsets for a word in WordNet
Working with hypernyms
Part of speech (POS)
Looking up lemmas and synonyms in WordNet
All possible synonyms
Antonyms
Calculating WordNet Synset similarity
Comparing verbs
Path and Leacock Chordorow (LCH) similarity
Discovering word collocations.
Scoring functions
Scoring ngrams
2. Replacing and Correcting Words
Stemming words
The LancasterStemmer class
The RegexpStemmer class
The SnowballStemmer class
Lemmatizing words with WordNet
Combining stemming with lemmatization
Replacing words matching regular expressions
Replacement before tokenization
Removing repeating characters
Spelling correction with Enchant
The en_GB dictionary
Personal word lists
Replacing synonyms
CSV synonym replacement
YAML synonym replacement
Replacing negations with antonyms
3. Creating Custom Corpora
Setting up a custom corpus
Loading a YAML file
Creating a wordlist corpus
Names wordlist corpus
English words corpus
Creating a part-of-speech tagged word corpus
Customizing the word tokenizer
Customizing the sentence tokenizer
Customizing the paragraph block reader
Customizing the tag separator.
Converting tags to a universal tagset
Creating a chunked phrase corpus
Tree leaves
Treebank chunk corpus
CoNLL2000 corpus
Creating a categorized text corpus
Category file
Categorized tagged corpus reader
Categorized corpora
Creating a categorized chunk corpus reader
Categorized CoNLL chunk corpus reader
Lazy corpus loading
Creating a custom corpus view
Block reader functions
Pickle corpus view
Concatenated corpus view
Creating a MongoDB-backed corpus reader
Corpus editing with file locking
4. Part-of-speech Tagging
Default tagging
Evaluating accuracy
Tagging sentences
Untagging a tagged sentence
Training a unigram part-of-speech tagger
Overriding the context model
Minimum frequency cutoff
Combining taggers with backoff tagging
Saving and loading a trained tagger with pickle
Training and combining ngram taggers
Quadgram tagger
Creating a model of likely word tags
How it works.
Tagging with regular expressions
Affix tagging
Working with min_stem_length
Training a Brill tagger
Tracing
Training the TnT tagger
Controlling the beam search
Significance of capitalization
Using WordNet for tagging
Tagging proper names
Classifier-based tagging
Detecting features with a custom feature detector
Setting a cutoff probability
Using a pre-trained classifier
Training a tagger with NLTK-Trainer
Saving a pickled tagger
Training on a custom corpus
Training with universal tags
Analyzing a tagger against a tagged corpus
Analyzing a tagged corpus
5. Extracting Chunks
Chunking and chinking with regular expressions
Parsing different chunk types
Parsing alternative patterns
Chunk rule with context
Merging and splitting chunks with regular expressions
Specifying rule descriptions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
The ChunkScore metrics.
Looping and tracing chunk rules
Training a tagger-based chunker
Using different taggers
Classification-based chunking
Using a different classifier builder
Extracting named entities
Binary named entity extraction
Extracting proper noun chunks
Extracting location chunks
Training a named entity chunker
Training a chunker with NLTK-Trainer
Saving a pickled chunker
Training on parse trees
Analyzing a chunker against a chunked corpus
Analyzing a chunked corpus
6. Transforming Chunks and Trees
Filtering insignificant words from a sentence
Correcting verb forms
Swapping verb phrases
Swapping noun cardinals
Swapping infinitive phrases
Singularizing plural nouns
Chaining chunk transformations
Converting a chunk tree to text
There's more.

ISBN

9781782167860
1782167862

OCLC

891381366
891786402

Statement on responsible collection description

Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...

Other views: Staff view

Princeton University Library Catalog

Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 / Jacob Perkins ; cover image by Faiz Fattohi.

Availability

Available Online

Details

Supplementary Information