Home > Net >  how to split a text file in several files with certain format
how to split a text file in several files with certain format

Time:05-05

I have a data like this

POW076956-1 CC1=CC=C(C=C1)C(=O)N1N=C(CC1C1=CC=CO1)C1=CC=C(NS(C)(=O)=O)C=C1
POW000136-2 CCCCOC1=CC=C(OCCCC)C2=C1NC1(N2)C(=O)NC2=CC=C(C=C12)[N ]([O-])=O
POW192689-1 CC(C)(C)C1=CC=C(C=C1)C1N(CCN2C=CC=C12)S(=O)(=O)C1=CC=C2C=CC=CC2=C1
POW005144-1 CC1=CC=C2N=C(OC2=C1)C1=CC=C(NC(=O)C2=CC=C(I)C=C2)C=C1
POW146687-1 O=S(=O)(C1=CC=CC=C1)C1=CC=C(COC2=CC=CC3=CC=CN=C23)C=C1
POW008940-2 OC(CNC1=CC=CC=C1)CN1C2=CC=C(I)C=C2C2=C1C=CC(I)=C2

I want to take the second part in each row and put it in a file with the name of the first part and format it as .txt

for instance take this

CC1=CC=C(C=C1)C(=O)N1N=C(CC1C1=CC=CO1)C1=CC=C(NS(C)(=O)=O)C=C1

put it in a file

save the file with the name POW076956-1.txt

CodePudding user response:

awk '{print $2 > $1".txt"}' input_file

find . -name "*.txt"
./POW000136-2.txt
./POW005144-1.txt
./POW008940-2.txt
./POW076956-1.txt
./POW146687-1.txt
./POW192689-1.txt

cat ./POW000136-2.txt
CCCCOC1=CC=C(OCCCC)C2=C1NC1(N2)C(=O)NC2=CC=C(C=C12)[N ]([O-])=O

CodePudding user response:

Something like the following should work:

#!/usr/bin/env bash

while read -r file data; do
    echo "$data" > "$file.txt"
done < 'input'

CodePudding user response:

Using sed

$ sed 's/\([^ ]*\) \(.*\)/echo "\2" > \1.txt/e' input_file
$ cat POW000136-2.txt
CCCCOC1=CC=C(OCCCC)C2=C1NC1(N2)C(=O)NC2=CC=C(C=C12)[N ]([O-])=O

CodePudding user response:

Personally, I would use awk, but you have tagged the question [bash] so a bash solution reading an entire line at a time and then splitting the line using a parameter-expansion with substring removal to separate the line into two-parts at the space is easily accomplished.

Presuming you provide the filename to read as the first argument to the program, it can be done with:

## loop reading each line
while read -r line || [ -n "$line" ]; do
  ## separate with parameter expansion & redirect to file
  printf "%s\n" "${line#* }" > "${line%% *}"
done < "$1"

The parameter expansions that trim from the left and right (front and back) are summarized as follows:

${var#pattern}      # Strip shortest match of pattern from front of $var
${var##pattern}     # Strip longest match of pattern from front of $var
${var%pattern}      # Strip shortest match of pattern from back of $var
${var%%pattern}     # Strip longest match of pattern from back of $var

There are several validations you will want to perform on the filename argument supplied to the program. First, you want to check at least one argument has been provided, and second, you want to validate the argument is a valid filename and that the file is non-empty. Putting it altogether you can do:

#!/bin/bash

[ -z "$1" ] && {  ## validate 1 argument given for filename
  printif "error: filename required.\nusage: %s file\n" "./${0##*/}" >&2
  exit 1
}

[ -s "$1" ] || {  ## validate file exists and is non-empty
  printf "error: file doesn't exist or is empty.\n" >&2
  exit 1
}

## loop reading each line
while read -r line || [ -n "$line" ]; do
  ## separate with parameter expansion & redirect to file
  printf "%s\n" "${line#* }" > "${line%% *}"
done < "$1"

Example Use/Output

With your sample input in the file named file and the script in splitfile.sh you can do:

$ bash splitfile.sh file

Resulting files created:

$ ls -al POW*
-rw-r--r-- 1 david david 64 May  4 19:53 POW000136-2
-rw-r--r-- 1 david david 54 May  4 19:53 POW005144-1
-rw-r--r-- 1 david david 50 May  4 19:53 POW008940-2
-rw-r--r-- 1 david david 63 May  4 19:53 POW076956-1
-rw-r--r-- 1 david david 55 May  4 19:53 POW146687-1
-rw-r--r-- 1 david david 67 May  4 19:53 POW192689-1

Example content for first file listed:

$ cat POW000136-2
CCCCOC1=CC=C(OCCCC)C2=C1NC1(N2)C(=O)NC2=CC=C(C=C12)[N ]([O-])=O

Note: for your input file with a few thousand or ten-thousand or so lines, the bash script is fine. For a million or more lines, use awk (or sed). The differences in efficiency processing large files between a shell script and proper utility grow by orders of magnitude as file size gets increasingly larger.

  •  Tags:  
  • bash
  • Related