How to Find the Most Repeated Word in a Text File using Python?


In this article, we will show you how to find the most repeated word in a given text file using python.

Assume we have taken a text file with the name ExampleTextFile.txt consisting of some random text. We will return the most repeated word in a given text file

ExampleTextFile.txt

Good Morning TutorialsPoint
This is TutorialsPoint sample File
Consisting of Specific
source codes in Python,Seaborn,Scala
Summary and Explanation
Welcome TutorialsPoint
Learn with a joy

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task −

  • Import the Counter function (The Counter class is a form of object data-set provided by Python3's collections module. The Collections module exposes specialized container datatypes to the user, serving as an alternative to Python's general-purpose built-ins such as dictionaries, lists, and tuples. The Counter is a subclass that counts hashable objects. When called, it creates an iterable hash table implicitly) from the collections module

  • Create a variable to store the path of the text file.

  • Create a list to store all the words.

  • Use the open() function(opens a file and returns a file object as a result) to open the text file in read-only mode by passing the file name, and mode as arguments to it (Here “r” represents read-only mode).

with open(inputFile, 'r') as filedata:
  • Traverse in each line of the file using the for loop.

  • Use the split() function (splits a string into a list. We can define the separator; the default separator is any whitespace) to split the text file content into a list of words and store it in a variable.

  • Traverse in the list of words using the for loop.

  • Use the append() function (adds the element to the list at the end), to append each word to the list.

  • Use the Counter() function (which gives the frequency of words as a key-value pairs), to calculate the frequency (number of times the word has occurred) of all the words.

  • Create a variable to store the maximum frequency.

  • Loop in the above words frequency dictionary using the for loop.

  • Using the if conditional statement and the in keyword, check whether the frequency of the word is greater than the maximum frequency.

The in keyword works in two ways:
The in keyword is used to determine whether a value exists in a sequence (list, range, string etc).
It is also used to iterate through a sequence in a for loop
  • If the frequency of the word is greater than the maximum frequency.

  • Create a variable to store the most repeated word in a text file.

  • Print the most repeated word in a text file.

  • Close the input file with the close() function (used to close an opened file).

Example

The following program traverses lines of a text file and prints the frequency of key-value pair from a text file using the counter function from the collections module -

# importing Counter function from collections import Counter # input text file inputFile = "ExampleTextFile.txt" # Storing all the words newWordsList = [] # Opening the given file in read-only mode with open(inputFile, 'r') as filedata: # Traverse in each line of the file for textline in filedata: # Splitting the text file content into list of words wordsList = textline.split() # Traverse in the above list of words for word in wordsList: # Appending each word to the new list newWordsList.append(word) # Using the Counter() function, calculate the frequency of all the words wordsFrequency = Counter(newWordsList) # Taking a variable to store the maximum frequency value maxFrequency = 0 # Loop in the above words frequency dictionary for textword in wordsFrequency: # Checking whether the frequency of the word is greater than the maximum frequency if(wordsFrequency[textword] > maxFrequency): # If it is true then set maximum frequency to the corresponding frequency value of the word maxFrequency = wordsFrequency[textword] # As this is the word with maximum frequency store this word in a variable mostRepeatedWord = textword # Printing the most repeated word in a text file print("{",mostRepeatedWord,"} is the most repeated word in a text file") # Closing the input file filedata.close()

Output

On executing, the above program will generate the following output −

{ TutorialsPoint } is the most repeated word in a text file

In this program, we read some random text from a text file. We read over the entire file, breaking it down into words, and adding all of the text file's words to the list. We used the Counter() method to count the frequency of all the words in the text file, which returns a dictionary with keys as words and values as the frequency of the word. Then we iterated over the dictionary's words, checking whether the frequency was greater than the maximum frequency. If it was, this was the most frequent word, so we saved the result in a variable and updated the maximum frequency with the frequency of the current word. Finally, we displayed the most frequent word.

Conclusion

This article showed us how to read a file, traverse it line by line, and retrieve all the words in that line. Once we get them, we may reverse the words, change the case, check the vowels, retrieve the word length, etc. We also learned how to use the Counter() method to determine the frequency of a list of words. This function may be used to determine the frequency of a string, list, tuple and so on.

Updated on: 18-Aug-2022

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements