![Trending Articles on Technical and Non Technical topics](/images/trending_categories.jpeg)
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to search contents of multiple pdf files on Linux?
The pdfgrep command in Linux is used to filter searches for a particular pattern of characters in a PDF or multiple PDFs. It is one of the most used Linux utility commands to display the lines that contain the pattern that we are trying to search.
Normally, the pattern that we are trying to search in the file is referred to as the regular expression.
Installing Pdf grep
For Ubuntu/Fedora
sudo apt-get update -y
sudo apt-get install -y pdfgrep
For CentOS
yum install pdfgrep
Syntax
pdfgrep [options...] pattern [files]
While there are plenty of different options available to us, some of the most used are −
-c : counts the number of matches per input file. -h : suppresses the prefixing of file name on output. -i : Ignores, case for matching -H : print the file name for each match -n : prefix each match with the number of the page where it is found -r : recursively search all files -R : same as -r, but it also follows all symlinks.
Now, let’s consider a case where we want to find a particular pattern in all the pdf files in a particular directory, say dir1.
Syntax
pdfgrep -HiR "word" *
In the above command replace the “word” placeholder with
For that we make use of the command shown below −
pdfgrep -HiR "func main()" *
The above command will try to find a string “func main()” in all the files in a particular directory and also in the subdirectories as well.
Output
main.go:120:func main() {}
In case we only want to find a particular pattern in a single directory and not the subdirectories then we need to use the command shown below −
pdfgrep -i "func main()" *
In the above command we made use of the -s flag which will help us to not get a warning for each subdirectory that is present inside the directory where we are running the command.
Output
main.go:120:func main() {}
Another command that we can make use of is the find command.
Command
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "func main()"' \;
Output
./main.go:func main() {