How to find all the distinct file extensions in a folder hierarchy (Linux)?


While there are plenty of ways to find the extension of a particular file in Linux using different utility commands, if we need to find all the distinct file extensions in a folder hierarchy we need to first understand the uses of the find and the sed command as these commands will be used to print all the distinct file extensions in a folder or a folder hierarchy.

The two Linux utility commands that we must be aware of are −

  • find − used to locate a particular file or directory

  • sed − short for stream editor and is used to perform functions like searching, editing and replacing.

When we are talking about a single folder then we don’t even need the find command as we can simply do that by iterating over all the files and then making use of the sort command as well.

Let’s say that I have a directory called dir1, and I want to know the file extensions in this folder that are distinct.

For that I will type the command shown below inside that directory.

Command

for file in *.*; do printf "%s
" "${file##*.}"; done | sort -u

Output

immukul@192 dir1 % for file in *.*; do printf "%s
" "${file##*.}"; done | sort -u app c dmg doc docx epub go h htm jnlp jpeg jpg json mp4 o odt pdf png srt torrent txt webm xlsx zip

As you can notice, all the extensions listed out in the above examples are distinct. Now, if we want to list all the distinct file extensions in a file hierarchy then we need to add a recursion to the above command.

Command

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

Output

immukul@192 dir1 % find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
app
c
dmg
bz2
callgrind
case-hosts
cc
cfg
cgi
comments
conf
config
contention
cov
cpu
crash
crt
css
csv
dat
debug_rnglists
demangle-expected
dep
description

Updated on: 29-Jul-2021

916 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements