Process Multiple Input Files Using Awk


Introduction

Awk is a powerful text processing tool that is widely used by developers, system administrators, and analysts to manipulate data in a variety of ways. It is a versatile tool that can process text files, extract data, and transform it into a variety of formats. One of key features of Awk is its ability to process multiple input files at same time, making it ideal for batch processing tasks. In this article, we will explore how to use Awk to process multiple input files and examine several examples of how it can be used in practice.

Using Awk with Multiple Input Files

Awk is a command-line tool that is designed to process text files. By default, Awk reads input from standard input (stdin), which can be redirected from a file or piped from another command. However, Awk can also process multiple input files at same time, which can be useful for batch processing tasks.

When processing multiple input files, Awk treats each file as a separate stream of input data. It reads each file in sequence, processing input data from each file in turn. This means that if you have multiple files that contain same type of data, you can use Awk to process them all in one go, rather than having to process each file individually.

Reading Data From Multiple Input Files

To read data from multiple input files, you can specify file names as arguments to Awk when you run it. For example, suppose you have two files, file1.txt and file2.txt, that contain following data −

file1.txt

apple
banana
orange

file2.txt

carrot
potato

You can use Awk to read data from both files and print it to screen using following command −

awk '{print}' file1.txt file2.txt

This command tells Awk to read data from both file1.txt and file2.txt and print it to screen. '{print}' command is used to specify action that Awk should take for each line of input data, in this case, to print it to screen.

Combining Data From Multiple Input Files

Another way to use Awk with multiple input files is to combine data from files into a single output stream. This can be useful if you have multiple files that contain related data and you want to combine them into a single data set. To do this, you can use cat command to concatenate files and then pipe output to Awk.

For example, suppose you have two files, file1.txt and file2.txt, that contain following data −

file1.txt

1,apple
2,banana
3,orange

file2.txt

1,carrot
2,potato

You can use following command to combine data from both files into a single output stream −

cat file1.txt file2.txt | awk -F ',' '{print $1,$2}'

This command uses cat command to concatenate file1.txt and file2.txt into a single output stream. output stream is then piped to Awk, which is instructed to use comma (',') as field separator using -F option. '{print $1,$2}' command is used to print first and second fields from each line of input data, separated by a space.

Processing data from multiple input files −

Processing data from multiple input files can involve a variety of operations, such as filtering, sorting, and aggregation. Awk provides a wide range of built-in functions that can be used to perform these operations, making it a powerful tool for data processing.

For example, suppose you have two files, file1.txt and file2.txt, that contain following data −

file1.txt

1,apple,0.5
2,banana,0.4
3,orange,0.3

file2.txt

1,carrot,0.2
2,potato,0.1

You can use Awk to process data from both files and calculate total weight of all fruits and vegetables using following command −

awk -F ',' '{if ($2 == "apple" || $2 == "banana" || $2 == "orange") sum += $3; else sum += 0.0} END {print sum}' file1.txt file2.txt

This command uses -F option to specify comma (',') as field separator. It then uses an if statement to filter out non-fruit data from file2.txt and calculate total weight of all fruits in both files. sum variable is initialized to zero and incremented for each line of input data that matches fruit criteria. END block is used to print final value of sum.

Examples of Using Awk with Multiple Input Files

Calculating total size of all files in a directory

Suppose you want to calculate total size of all files in a directory and its subdirectories. You can use following command to generate a list of all files in directory and its subdirectories, and then pipe output to Awk to calculate total size −

find /path/to/directory -type f -print0 | xargs -0 ls -l | awk '{sum += $5} END {print sum}'

This command uses find command to generate a list of all files in directory and its subdirectories, and xargs command to pass list to ls -l to generate a detailed listing of each file. output of ls -l is then piped to Awk, which uses $5 field to calculate total size of all files in bytes. END block is used to print final value of sum.

Merging two CSV files

Suppose you have two CSV files that contain related data, and you want to merge them into a single CSV file. You can use following command to merge files and output result to a new file:

awk 'FNR==1 && NR!=1 {next;}{print}' file1.csv file2.csv > merged.csv

This command uses FNR and NR built-in variables to ensure that header row is only included once in output file. first file is processed normally, but second file is skipped if current line number (FNR) is equal to 1 and total line number (NR) is not equal to 1. resulting output is then redirected to a new file, merged.csv.

Conclusion

Awk is a powerful text processing tool that can be used to process multiple input files. It provides a wide range of built-in functions that can be used to manipulate data in a variety of ways, making it ideal for batch processing tasks. By combining multiple input files, Awk can help you to streamline your data processing workflows and improve efficiency of your data analysis tasks.

Updated on: 24-Mar-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements