I have a .txt file and there are three sed commands that I am using to manipulate it. First I convert it to a .csv by substituting tabs for commas (A), then I remove lines 1-8 (B) and then remove a '# ' that is in the beginning of line 9 (C).
(A) sed 's/\t/,/g' individuals/$message/$message.txt > individuals/$message/$message.csv
(B) sed -i 1,8d individuals/$message/$message.csv
(C) sed -i 's/.\{2\}//' individuals/$message/$message.csv
Is there a better way to do it, maybe integrating these three commands into a single one? It doesn't need to be done using sed but it does need to be done via bash commands.
Here is a sample of my data:
# This data file generated by PLINK at: Mon Jul 11 16:18:56 2022
#
# Below is a text version of your data. Fields are TAB-separated.
# Each line corresponds to a single SNP. For each SNP, we provide its
# identifier, its location on a reference human genome, and the genotype call.
# For further information (e.g. which reference build was used), consult the
# original source of your data.
#
# rsid chromosome position genotype
22:16050607G-A 22 16050607 GG
I deeply appreciate the help!
PS: Line 9 is the # rsid chromossome...
one and it should be kept in the file, just without the #
CodePudding user response:
Use multiple -e
options to execute multiple sed
commands in one call.
sed -e '1,8d' -e '9s/^# //' -e '9,$s/\t/,/g' "individuals/$message/$message.txt" > "individuals/$message/$message.csv"