How to shift a specific cell left in a text file-CodePudding

I have a very large text file (tab-delimited, first line is header) like this:

A1  A2  A3  A4  A5  A6  A7  
FA1 AB  234 231 0.02    456 I   
FA2 ACE 241 2154    0.1 324 O   
FA3 AD  AC  150 2367    0.02    123 I
FA  AFQ ASB 123 2473    0.4 630 I

As you can see, there are two strings in column 3 at the lines 3 and 4 columns 3 (A3). Could you please help me out with how I can delete these strings and shift cells left using awk, sed or any Linux codes to have the corrected file like:

A1  A2  A3  A4  A5  A6  A7
FA1 AB  234 231 0.02    456 I
FA2 ACE 241 2154    0.1 324 O
FA3 AD  150 2367    0.02    123 I
FA  AFQ 123 2473    0.4 630 I

I tried:

awk 'if($3!~/[0-9] /') $3=$4}1', file

It removes any strings in column 3 and replaces them with column 4, but without shifting cells left.

CodePudding user response：

Using sed

$ sed '1!s/^\([^ ]* \ [^ ]* \ \)[A-Z][^ ]* \ /\1/' input_file
A1  A2  A3  A4  A5  A6  A7
FA1 AB  234 231 0.02    456 I
FA2 ACE 241 2154    0.1 324 O
FA3 AD  150 2367    0.02    123 I
FA  AFQ 123 2473    0.4 630 I

1! - Do not match line 1

^\([^ ]* \ [^ ]* \ \) - Using backreferencing, we can store to memory the contents of a parenthesis, this will match everything up to the second space.

[A-Z][^ ]* \ - Anything not within the parenthesis will be excluded from the matched return. If the third column contains capital alphabetic characters, then evrything up to the next space is excluded.

\1 - Return anything captured within the parenthesis

CodePudding user response：

You may use this awk:

awk 'BEGIN{FS=OFS="\t"} NR > 1 && $3 0 != $3 {
$3 = ""; sub(FS FS, FS)} 1' file

A1   A2   A3   A4    A5    A6   A7
FA1  AB   234  231   0.02  456  I
FA2  ACE  241  2154  0.1   324  O
FA3  AD   150  2367  0.02  123  I
FA   AFQ  123  2473  0.4   630  I

CodePudding user response：

This might work for you (GNU sed):

sed -E '1!s/^((\S \s ){2})[A-Z]\S \s /\1/' file

Remove the third field and the space(s) following if it begins with a character A through Z.

CodePudding user response：

$ awk -F'\t ' -v OFS='\t' 'NF>7{$3=""; $0=$0; $1=$1} 1' file
A1      A2      A3      A4      A5      A6      A7
FA1     AB      234     231     0.02    456     I
FA2     ACE     241     2154    0.1     324     O
FA3     AD      150     2367    0.02    123     I
FA      AFQ     123     2473    0.4     630     I

$ awk -v OFS='\t' '{print $1, $2, $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF}' file
A1      A2      A3      A4      A5      A6      A7
FA1     AB      234     231     0.02    456     I
FA2     ACE     241     2154    0.1     324     O
FA3     AD      150     2367    0.02    123     I
FA      AFQ     123     2473    0.4     630     I