I have a very large text file (tab-delimited, first line is header) like this:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD AC 150 2367 0.02 123 I
FA AFQ ASB 123 2473 0.4 630 I
As you can see, there are two strings in column 3 at the lines 3 and 4 columns 3 (A3). Could you please help me out with how I can delete these strings and shift cells left using awk, sed or any Linux codes to have the corrected file like:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
I tried:
awk 'if($3!~/[0-9] /') $3=$4}1', file
It removes any strings in column 3 and replaces them with column 4, but without shifting cells left.
CodePudding user response:
Using sed
$ sed '1!s/^\([^ ]* \ [^ ]* \ \)[A-Z][^ ]* \ /\1/' input_file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
1!
- Do not match line 1
^\([^ ]* \ [^ ]* \ \)
- Using backreferencing, we can store to memory the contents of a parenthesis, this will match everything up to the second space.
[A-Z][^ ]* \
- Anything not within the parenthesis will be excluded from the matched return. If the third column contains capital alphabetic characters, then evrything up to the next space is excluded.
\1
- Return anything captured within the parenthesis
CodePudding user response:
You may use this awk
:
awk 'BEGIN{FS=OFS="\t"} NR > 1 && $3 0 != $3 {
$3 = ""; sub(FS FS, FS)} 1' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
CodePudding user response:
This might work for you (GNU sed):
sed -E '1!s/^((\S \s ){2})[A-Z]\S \s /\1/' file
Remove the third field and the space(s) following if it begins with a character A
through Z
.
CodePudding user response:
$ awk -F'\t ' -v OFS='\t' 'NF>7{$3=""; $0=$0; $1=$1} 1' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
$ awk -v OFS='\t' '{print $1, $2, $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF}' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I