Split a File at Given Line Number


Introduction

Sometimes, it may be necessary to split a large file into smaller chunks for easier manipulation or for transfer to other systems. In Linux, the split command can be used to split a file into smaller files based on a specified number of lines.

The split command is a utility that is used to split a file into smaller files, called chunks or pieces. The split command reads the input file, and writes the output files in the specified size or at the specified line number. By default, the split command creates files with a .x suffix, where x is a letter starting from aa, ab, ac, and so on.

How to Use the Split Command?

To split a file based on the number of lines, use the following syntax -

$ split -l lines file output_prefix
  • -l lines − Specifies the number of lines for each output file.

  • file − The input file that you want to split.

  • output_prefix − The prefix for the output files. The output files will be named output_prefixaa, output_prefixab, output_prefixac, and so on.

For example, to split the file bigfile.txt into chunks of 1000 lines each, with the output files having the prefix splitfile, use the following command

$ split -l 1000 bigfile.txt splitfile

This will create the following files: splitfileaa, splitfileab, splitfileac, and so on.

Examples of splitting files

Here are some examples of using the split command to split a file at specific line numbers

Split a File into Chunks of 1000 Lines Each

To split the file bigfile.txt into chunks of 1000 lines each, with the output files having the prefix splitfile, use the following command

$ split -l 1000 bigfile.txt splitfile

This will create the following files − splitfileaa, splitfileab, splitfileac, and so on.

Split a File into Chunks of 500 Lines Each, Starting at Line 100

To split the file bigfile.txt into chunks of 500 lines each, starting at line 100, with the output files having the prefix splitfile, use the following command

$ split -l 500 -d bigfile.txt splitfile 100

This will create the following files − splitfile00, splitfile01, splitfile02, and so on.

Split a File into using Numeric Suffixes

To split the file bigfile.txt into chunks of 100 lines each, starting at line 1000, with the output files having the prefix splitfile and numeric suffixes, use the following command

$ split -l 100 -d bigfile.txt splitfile 1000

This will create the following files − splitfile000, splitfile001, splitfile002, and so on.

Split a File into Chunks using a Different Suffix

To split the file bigfile.txt into chunks of 2000 lines each, with the output files having the prefix splitfile and the suffix .txt, use the following command

$ split -l 2000 --suffix-length=4 bigfile.txt splitfile

This will create the following files − splitfile0000.txt, splitfile0001.txt, splitfile0002.txt, and so on.

Split a File into Chunks and specify the Output Directory

To split the file bigfile.txt into chunks of 1000 lines each, with the output files having the prefix splitfile and stored in the output directory, use the following command

$ split -l 1000 bigfile.txt output/splitfile

This will create the following files in the output directory − splitfileaa, splitfileab, splitfileac, and so on.

Split a File into Chunks of 500 Lines Each, and Store the Line Numbers in the Output Filenames

To split the file bigfile.txt into chunks of 500 lines each, with the line numbers included in the output filenames, use the following command

$ split -l 500 --additional-suffix=.txt bigfile.txt splitfile

This will create the following files: splitfileaa.txt, splitfileab.txt, splitfileac.txt, and so on. The line numbers will be included in the suffix, separated by a period. For example, splitfileaa.txt will contain lines 1-500, splitfileab.txt will contain lines 501-1000, and so on.

Alternative Commands

There are a few other commands that can be used to split a file in Linux, although they may not have all the options and functionality of the split command. Some alternatives to the split command include

  • csplit − The csplit command is similar to split, but it allows you to specify the point at which to split the file using a pattern or a line number. For example, to split a file at every occurrence of the pattern "---", use the following command: csplit file /---/

  • awk − The awk command is a powerful text-processing tool that can be used to split a file based on a given pattern or field. For example, to split a CSV file into separate files for each line, use the following command − awk -F, '{print > $1".txt"}' file

  • sed − The sed command is a text-processing tool that can be used to perform various operations on a file, including splitting. To split a file into separate files based on a pattern, use the sed command in combination with the awk command. For example, to split a file at every occurrence of the pattern "---", use the following command − sed -n '/---/{h;d};H;${x;s/.*//;p;}' file | awk -F"---" '{print > (NR+1)".txt"}'

It's worth noting that these alternatives may not be as efficient or as easy to use as the split command, and may require more advanced knowledge of text processing in Linux.

Conclusion

Overall, the split command is a useful utility for splitting a large file into smaller chunks based on a specified number of lines in Linux. It is a convenient tool for cases where you need to manipulate or transfer large files more easily. The split command has several options that allow you to customize the output files, including specifying the prefix, suffix, nd starting line number.

There are also several other commands that can be used to split a file in Linux, such as csplit, awk, and sed. These alternatives may offer more advanced functionality or the ability to split based on patterns, but they may not be as efficient or easy to use as the split command.

Updated on: 04-Jan-2023

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements