top of page
Search

Counting Token In Paragraphs Using Python

  • Writer: Ajay Sharma
    Ajay Sharma
  • Jan 16, 2021
  • 3 min read

In this article, we will try to see how we count the words in a paragraph using two different approaches.

Let’s take a text file that contains the summary of a screenplay of a movie taken from Wikipedia and see how we can perform the task.


Reading the File

FileName = ("script_file.txt")

with open(FileName, 'r') as file:
    lines = file.read()
    print(lines)
The format is structured so that one page equates to
roughly one minute of screen time, though this is only used as a ballpark
estimate and often bears little resemblance to the running time of the final
movie.[1] The standard font is 12 point, 10 pitch Courier Typeface.[2]

The major components are action (sometimes called
"screen direction") and dialogue. The action is written in the
present tense and is limited to what can be heard or seen by the audience, for
example descriptions of settings, character movements, or sound effects. The
dialogue is the words the characters speak, and is written in a center column.

Unique to the screenplay (as opposed to a stage play) is
the use of slug lines. A slug line, also called a master scene heading, occurs
at the start of every scene and typically contains three pieces of information:
whether the scene is set inside (interior/INT.) or outside (exterior/EXT.), the
specific location, and the time of day. Each slug line begins a new scene. In a
"shooting script" the slug lines are numbered consecutively for ease
of reference.

Method 1

How to count words Using nltk?

Here, we will see how we can use nltk package to count the words in the given text file.

Let’s take an example.

import nltk

FileName = ("script_file.txt")

with open(FileName, 'r') as file:
    lines = file.read() 
    nltk_tokens = nltk.word_tokenize(lines)
    print(nltk_tokens)
    print("\n")
    print("Number of Words: " ,len(nltk_tokens))
['The', 'format', 'is', 'structured', 'so', 'that',
'one', 'page', 'equates', 'to', 'roughly', 'one', 'minute', 'of', 'screen', 'time',
',', 'though', 'this', 'is', 'only', 'used', 'as', 'a', 'ballpark', 'estimate',
'and', 'often', 'bears', 'little', 'resemblance', 'to', 'the', 'running',
'time', 'of', 'the', 'final', 'movie', '.', '[', '1', ']', 'The', 'standard',
'font', 'is', '12', 'point', ',', '10', 'pitch', 'Courier', 'Typeface', '.',
'[', '2', ']', 'The', 'major', 'components', 'are', 'action', '(', 'sometimes',
'called', '``', 'screen', 'direction', "''", ')', 'and', 'dialogue',
'.', 'The', 'action', 'is', 'written', 'in', 'the', 'present', 'tense', 'and',
'is', 'limited', 'to', 'what', 'can', 'be', 'heard', 'or', 'seen', 'by', 'the',
'audience', ',', 'for', 'example', 'descriptions', 'of', 'settings', ',',
'character', 'movements', ',', 'or', 'sound', 'effects', '.', 'The', 'dialogue',
'is', 'the', 'words', 'the', 'characters', 'speak', ',', 'and', 'is',
'written', 'in', 'a', 'center', 'column', '.', 'Unique', 'to', 'the',
'screenplay', '(', 'as', 'opposed', 'to', 'a', 'stage', 'play', ')', 'is',
'the', 'use', 'of', 'slug', 'lines', '.', 'A', 'slug', 'line', ',', 'also',
'called', 'a', 'master', 'scene', 'heading', ',', 'occurs', 'at', 'the',
'start', 'of', 'every', 'scene', 'and', 'typically', 'contains', 'three',
'pieces', 'of', 'information', ':', 'whether', 'the', 'scene', 'is', 'set',
'inside', '(', 'interior/INT', '.', ')', 'or', 'outside', '(', 'exterior/EXT',
'.', ')', ',', 'the', 'specific', 'location', ',', 'and', 'the', 'time', 'of',
'day', '.', 'Each', 'slug', 'line', 'begins', 'a', 'new', 'scene', '.', 'In',
'a', '``', 'shooting', 'script', "''", 'the', 'slug', 'lines', 'are',
'numbered', 'consecutively', 'for', 'ease', 'of', 'reference', '.']

Number of Words: 
223

Method 2

How to count words using the split function of python?

This is another approach to count numbers of words in the given text file using python split() functions.

Let’s see it with an example.

FileName = ("script_file.txt")

with open(FileName, 'r') as file:
    lines_in_file = file.read()
    print lines_in_file.split()
    print("\n")
    print("Number of Words: ",len(lines_in_file.split()))
['The', 'format', 'is', 'structured', 'so', 'that',
'one', 'page', 'equates', 'to', 'roughly', 'one', 'minute', 'of', 'screen',
'time,', 'though', 'this', 'is', 'only', 'used', 'as', 'a', 'ballpark',
'estimate', 'and', 'often', 'bears', 'little', 'resemblance', 'to', 'the',
'running', 'time', 'of', 'the', 'final', 'movie.[1]', 'The', 'standard',
'font', 'is', '12', 'point,', '10', 'pitch', 'Courier', 'Typeface.[2]', 'The',
'major', 'components', 'are', 'action', '(sometimes', 'called', '"screen',
'direction")', 'and', 'dialogue.', 'The', 'action', 'is', 'written', 'in',
'the', 'present', 'tense', 'and', 'is', 'limited', 'to', 'what', 'can', 'be',
'heard', 'or', 'seen', 'by', 'the', 'audience,', 'for', 'example',
'descriptions', 'of', 'settings,', 'character', 'movements,', 'or', 'sound',
'effects.', 'The', 'dialogue', 'is', 'the', 'words', 'the', 'characters',
'speak,', 'and', 'is', 'written', 'in', 'a', 'center', 'column.', 'Unique',
'to', 'the', 'screenplay', '(as', 'opposed', 'to', 'a', 'stage', 'play)', 'is',
'the', 'use', 'of', 'slug', 'lines.', 'A', 'slug', 'line,', 'also', 'called',
'a', 'master', 'scene', 'heading,', 'occurs', 'at', 'the', 'start', 'of', 'every',
'scene', 'and', 'typically', 'contains', 'three', 'pieces', 'of',
'information:', 'whether', 'the', 'scene', 'is', 'set', 'inside',
'(interior/INT.)', 'or', 'outside', '(exterior/EXT.),', 'the', 'specific',
'location,', 'and', 'the', 'time', 'of', 'day.', 'Each', 'slug', 'line',
'begins', 'a', 'new', 'scene.', 'In', 'a', '"shooting', 'script"',
'the', 'slug', 'lines', 'are', 'numbered', 'consecutively', 'for', 'ease',
'of', 'reference.']

Number of Words: 
183

I hope you enjoyed reading this article and finally, you came to know about Counting Token in Paragraphs using Python.

For more such blogs/courses on data science, machine learning, artificial intelligence, and emerging new technologies do visit us at InsideAIML.

Thanks for reading…

Happy Learning…

 
 
 

Recent Posts

See All
Python MongoDB - Insert Document

In this article, we will try to see how we can insert documents into MongoDB using python. You can store documents into MongoDB using the...

 
 
 

Comments


Post: Blog2 Post

Subscribe Form

Thanks for submitting!

©2020 by InsideAIML

bottom of page