Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to find all the distinct file extensions in a folder hierarchy (Linux)?
Finding all distinct file extensions in a folder hierarchy is a common task in Linux system administration and file management. This operation requires combining several powerful Linux utilities to recursively traverse directories and extract unique file extensions.
Required Linux Commands
The two essential Linux utility commands for this task are −
find − Locates files and directories recursively based on specified criteria
sed − Stream editor used for text manipulation, searching, and pattern replacement
Finding Extensions in a Single Directory
For a single directory, you can iterate through files without using the find command. This approach uses shell parameter expansion to extract extensions −
for file in *.*; do printf "%s<br>" "${file##*.}"; done | sort -u
The ${file##*.} syntax removes everything up to and including the last dot, leaving only the extension.
Sample Output
app c dmg doc docx epub go h htm jnlp jpeg jpg json mp4 o odt pdf png srt torrent txt webm xlsx zip
Finding Extensions in Directory Hierarchy
To find distinct file extensions recursively across an entire directory hierarchy, combine find with sed for pattern extraction −
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
This command works by −
find . -type f− Finds all regular files recursively from current directorysed -e 's/.*\.//'− Removes everything up to and including the last dotsed -e 's/.*\///'− Removes any remaining path separatorssort -u− Sorts the results and removes duplicates
Alternative Approaches
Here are additional methods to achieve the same result −
Using awk
find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u
Using basename and parameter expansion
find . -type f -name "*.*" -exec basename {} \; | sed 's/.*\.//' | sort -u
Key Points
The
findcommand with-type fensures only regular files are processedRegular expressions in
sedhandle edge cases like files with multiple dotsThe
sort -ucombination provides alphabetically sorted unique extensionsFiles without extensions will not appear in the output
Conclusion
Finding distinct file extensions in Linux directories combines the power of find, sed, and sort commands. The recursive approach using find enables comprehensive analysis of entire directory trees, making it invaluable for system administration and file organization tasks.
