Linux Command: Split Half of Large File by Size/Lines Into Alternate Directory

~/Storage02$ wc -l ~/Storage01/Myspace.com.txt
360213049

~/Storage02$ calc 360213049/2
    180106524.5

~/Storage02$ tail -n +180106524 ~/Storage01/Myspace.com.txt | split -C 1G

This command will take the file Myspace.com.txt (the MySpace data dump) in the folder Storage01 and split it into files that are about 1GB in size in the Storage02 folder. The files will be just under 1GB. This is because the -C switch will break the file up by line. That means the files will fit as many full lines in 1GB as possible without going over and without breaking a line apart.

Storage02 is on a flash drive with a max file size of 1GB and Storage01 does not have enough room to fit another copy of the 33GB file. To use this command, first cd into the directory that you want to split the file into. count the lines using wc -l. Quickly get the half way point using calc. If you don’t have calc, say sudo apt-get install calc. From the directory you want to write to, you use tail -n to print all the lines after the half way point and pipe that into split.

This is what we accomplish in order:

  1. Count the lines in our large file
  2. Divide the line count by 2 to get the half-way point
  3. Use tail to spit out the last half of the file
  4. Pipe the last half into split
  5. Split the file into 1GB pieces into a different directory

From here I plan to put the file into a MySQL database so that I can use it on a project I’m working on (http://www.kubisec.com/ http://kubisec.mywire.org)