Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Find the Most Repeated Word in a Text File using Python?
In this article, we will show you how to find the most repeated word in a given text file using Python. We'll use the Counter class from the collections module to efficiently count word frequencies.
Assume we have a text file named sample.txt containing some random text ?
Good Morning TutorialsPoint This is TutorialsPoint sample File Consisting of Specific source codes in Python,Seaborn,Scala Summary and Explanation Welcome TutorialsPoint Learn with a joy
Using Counter from Collections Module
The Counter class is a specialized dictionary that counts hashable objects. It's perfect for counting word frequencies ?
from collections import Counter
# Create sample text content (simulating file reading)
text_content = """Good Morning TutorialsPoint
This is TutorialsPoint sample File
Consisting of Specific
source codes in Python,Seaborn,Scala
Summary and Explanation
Welcome TutorialsPoint
Learn with a joy"""
# Split text into words and store in a list
words = []
for line in text_content.split('\n'):
# Split each line into words
line_words = line.split()
words.extend(line_words)
# Count frequency of each word
word_frequency = Counter(words)
# Find the word with maximum frequency
most_repeated_word = word_frequency.most_common(1)[0]
print(f"Word frequencies: {dict(word_frequency)}")
print(f"'{most_repeated_word[0]}' is the most repeated word with {most_repeated_word[1]} occurrences")
Word frequencies: {'Good': 1, 'Morning': 1, 'TutorialsPoint': 3, 'This': 1, 'is': 1, 'sample': 1, 'File': 1, 'Consisting': 1, 'of': 1, 'Specific': 1, 'source': 1, 'codes': 1, 'in': 1, 'Python,Seaborn,Scala': 1, 'Summary': 1, 'and': 1, 'Explanation': 1, 'Welcome': 1, 'Learn': 1, 'with': 1, 'a': 1, 'joy': 1}
'TutorialsPoint' is the most repeated word with 3 occurrences
Reading from Actual File
Here's how to read from an actual text file and find the most repeated word ?
from collections import Counter
def find_most_repeated_word(filename):
words = []
# Open and read the file
try:
with open(filename, 'r') as file:
for line in file:
# Split each line into words and add to list
line_words = line.strip().split()
words.extend(line_words)
# Count word frequencies
word_frequency = Counter(words)
# Find most common word
if word_frequency:
most_repeated = word_frequency.most_common(1)[0]
return most_repeated[0], most_repeated[1]
else:
return None, 0
except FileNotFoundError:
print(f"File '{filename}' not found")
return None, 0
# Usage
filename = "sample.txt"
word, frequency = find_most_repeated_word(filename)
if word:
print(f"'{word}' is the most repeated word with {frequency} occurrences")
else:
print("No words found in the file")
Case-Insensitive Word Counting
To handle words with different cases as the same word, convert to lowercase ?
from collections import Counter
import re
text_content = """Good Morning TutorialsPoint
This is TUTORIALSPOINT sample File
tutorialspoint is great
Welcome TutorialsPoint
Learn with a joy"""
# Extract words and convert to lowercase, remove punctuation
words = []
for line in text_content.split('\n'):
# Use regex to find words (letters only)
line_words = re.findall(r'\b[a-zA-Z]+\b', line.lower())
words.extend(line_words)
# Count word frequencies
word_frequency = Counter(words)
# Find most repeated word
most_repeated = word_frequency.most_common(1)[0]
print(f"All words (lowercase): {words}")
print(f"'{most_repeated[0]}' appears {most_repeated[1]} times")
All words (lowercase): ['good', 'morning', 'tutorialspoint', 'this', 'is', 'tutorialspoint', 'sample', 'file', 'tutorialspoint', 'is', 'great', 'welcome', 'tutorialspoint', 'learn', 'with', 'a', 'joy'] 'tutorialspoint' appears 4 times
Comparison of Methods
| Method | Advantages | Use Case |
|---|---|---|
| Manual counting with loop | Full control over logic | Custom counting requirements |
| Counter.most_common() | Built-in, efficient, simple | Standard word frequency analysis |
| Case-insensitive with regex | Handles punctuation and case | Real-world text processing |
Conclusion
Use the Counter class from collections module for efficient word counting. The most_common() method directly returns the most frequent words. For real-world applications, consider case-insensitive matching and punctuation removal using regex.
