How to search contents of multiple pdf files on Linux?

The pdfgrep command in Linux is a powerful utility used to search for specific text patterns within PDF files. It functions similarly to the standard grep command but is specifically designed for PDF documents, making it invaluable for searching through multiple PDF files simultaneously.

The search pattern can be a simple text string or a regular expression, allowing for flexible and precise searches across your PDF collection.

Installing pdfgrep

For Ubuntu/Debian

sudo apt-get update -y
sudo apt-get install -y pdfgrep

For CentOS/RHEL

yum install pdfgrep

For Fedora

dnf install pdfgrep

Syntax and Common Options

pdfgrep [options...] pattern [files]

The most frequently used options include:

Option Description
-c Count the number of matches per input file
-h Suppress the prefixing of file names in output
-i Ignore case for matching
-H Print the file name for each match
-n Prefix each match with the page number
-r Recursively search all files in subdirectories
-R Same as -r, but also follows symbolic links

Examples

Search All PDF Files Recursively

To search for a pattern in all PDF files in the current directory and subdirectories:

pdfgrep -HiR "search term" *.pdf

Example searching for "machine learning" in all PDFs:

pdfgrep -HiR "machine learning" *.pdf
research.pdf:23:Machine learning algorithms are essential
tutorial.pdf:5:Introduction to machine learning concepts

Search with Page Numbers

To include page numbers in the output:

pdfgrep -Hn "algorithm" *.pdf
document.pdf:15:The sorting algorithm performs efficiently
paper.pdf:3:This algorithm solves the optimization problem

Count Matches Only

To count occurrences without showing the actual matches:

pdfgrep -c "data" *.pdf
report.pdf:7
analysis.pdf:12

Alternative Method Using find and pdftotext

For more complex searches or when pdfgrep is unavailable, you can combine find, pdftotext, and grep:

find /path/to/directory -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "search term"' \;

Example searching for "neural network" in all PDFs under /home/user/documents:

find /home/user/documents -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" "neural network"' \;
/home/user/documents/ai_research.pdf:Neural networks have revolutionized
/home/user/documents/thesis.pdf:The neural network architecture consists

Conclusion

The pdfgrep command provides an efficient way to search text content across multiple PDF files in Linux. Combined with various options, it enables precise pattern matching with detailed output formatting, making it an essential tool for document management and research tasks.

Updated on: 2026-03-17T09:01:38+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements