Home > Enterprise >  create the column and rename the files of second column to new column in linux?
create the column and rename the files of second column to new column in linux?

Time:03-31

Example data:

cat lookup.tsv
SRR7015874_1.fastq
SRR7015874_2.fastq
SRR7015875_1.fastq
SRR7015875_2.fastq
SRR7015876_1.fastq
SRR7015876_2.fastq
SRR7015877_1.fastq
SRR7015877_2.fastq

Using this command:

awk '{print $1 "\t" "SRR\_" NR ".fastq"}' lookup.tsv > lookup_table.tsv

I get two columns:

SRR7015874_1.fastq   SRR_1.fastq
SRR7015874_2.fastq   SRR_2.fastq
SRR7015875_1.fastq   SRR_3.fastq
SRR7015875_2.fastq   SRR_4.fastq
SRR7015876_1.fastq   SRR_5.fastq
SRR7015876_2.fastq   SRR_6.fastq
SRR7015877_1.fastq   SRR_7.fastq
SRR7015877_2.fastq   SRR_8.fastq

Now I want to create third column, like this:

SRR1_1.fastq
SRR1_2.fastq
SRR2_1.fastq
SRR2_2.fastq
SRR3_1.fastq
SRR3_2.fastq
SRR4_1.fastq
SRR4_2.fastq

And I want to use the second and third columns to rename files (i.e. if the filename = $2, change it to $3)

I tried:

cat lookup_table.tsv | while read c1 c2; do mv $c1 $c2 ; done
SRR1_1.fastq
SRR1_2.fastq
SRR2_1.fastq
SRR2_2.fastq
SRR3_1.fastq
SRR3_2.fastq

But this was not successful. Is there an error in my code/approach?

CodePudding user response:

Does this solve your problem?

awk '{print $1 "\t" "SRR_" NR ".fastq"}' lookup.tsv > tmp
awk 'END{for (i=1; i<=4; i  ) for (j=1; j<=2; j  ) print "SRR" i "_" j ".fastq"}' tmp > third_column.txt
paste tmp third_column.txt > lookup_table.txt
cat lookup_table.txt
SRR7015874_1.fastq  SRR_1.fastq SRR1_1.fastq
SRR7015874_2.fastq  SRR_2.fastq SRR1_2.fastq
SRR7015875_1.fastq  SRR_3.fastq SRR2_1.fastq
SRR7015875_2.fastq  SRR_4.fastq SRR2_2.fastq
SRR7015876_1.fastq  SRR_5.fastq SRR3_1.fastq
SRR7015876_2.fastq  SRR_6.fastq SRR3_2.fastq
SRR7015877_1.fastq  SRR_7.fastq SRR4_1.fastq
SRR7015877_2.fastq  SRR_8.fastq SRR4_2.fastq

while read -r c1 c2 c3; do mv "$c2" "$c3"; done < lookup_table.txt

CodePudding user response:

You could get the data for the third column using the NR and the modulo to increment i every 2 lines, and another variable j which is either 1 or 2.

awk '{
  if (NR % 2 == 1) {  i; j=1} else {j=2}
  print $1 "\tSRR_" NR ".fastq\tSSR" i "_" j ".fastq"
}' lookup.tsv > lookup_table.tsv

The content in the file lookup_table.tsv is

SRR7015874_1.fastq  SRR_1.fastq SRR1_1.fastq
SRR7015874_2.fastq  SRR_2.fastq SRR1_2.fastq
SRR7015875_1.fastq  SRR_3.fastq SRR2_1.fastq
SRR7015875_2.fastq  SRR_4.fastq SRR2_2.fastq
SRR7015876_1.fastq  SRR_5.fastq SRR3_1.fastq
SRR7015876_2.fastq  SRR_6.fastq SRR3_2.fastq
SRR7015877_1.fastq  SRR_7.fastq SRR4_1.fastq
SRR7015877_2.fastq  SRR_8.fastq SRR4_2.fastq

To rename the files:

while read c1 c2 c3; do mv "$c2" "$c3"; done < lookup_table.tsv
  • Related