I've been working on a csv file.
I want to replace the the 3rd and 4th columns only.
Here is an example:
Col1 Col2 Col3 Col4 Col5 Col6 Col7
c1 f2 stack.overflow stack.overflow f5 stack.overflow stack.overflow
d1 g2 stack2.overflow2 stack2.overflow2 xyx stack2.overflow2 stack2.overflow2
I carry out the search and replace of Col3 and Col4 only.
What I get is:
Col1 Col2 Col3 Col4 Col5 Col6 Col7
c1 f2 macy.greytold macy.greytold f5 stack.overflow stack.overflow
d1 g2 prince.caballero prince.caballero xyx stack2.overflow2 stack2.overflow2
I used sed:
sed ' s!stack.overflow2!prince.caballero!g' filename.csv > test1.csv
sed ' s!stack.overflow!macy.greytold!g' test1.csv > test-2.csv
But that changes the whole lot. Is there a way to just change fields 3 and 4?
Please note that I have given an example of 2 lines. This can run on to 100 lines.
CodePudding user response:
Assuming your file is a TSV file (tab-separated values) this is easy with Awk.
awk -F '\t' '$3 == "stack.overflow2" { $3 = "prince.caballero" }
$4 == "stack.overflow2" { $4 = "prince.caballero" }
$3 == "stack.overflow" { $3 = "macy.greytold" }
$4 == "stack.overflow" { $4 = "macy.greytold" }
1' filename.csv >updated.tsv
Awk processes one line (or, more generally, one record; you can change the record separator with RS=value
) at a time. The variables $1
, $2
etc get the values of the respective fields on the line. The ==
comparison simply checks for verbatim equality (please update the question if you want to match substrings of these fields, or e.g. case-insensitive matching) and the final 1
says to print every line.
In the general case, CSV and its variations allows for quoting of fields, which means not all records are a single line and not all field separators actually separate two fields, at which point using simple line-oriented tools like Awk may no longer be feasible. If you bump into these complications, look for a dedicated CSV processing tool, or switch to a language which comes with a proper CSV parser -- Python is one popular choice.
CodePudding user response:
You ae using the global g
flag which means substitute all matches. You could filter it down by targeting the occurrence sed ' s!stack.overflow2!prince.caballero!2'
for the second etc.
You could also just simply do the substitution twice for each match.
Using sed
$ sed 's/stack.overflow/macy.greytold/;s/stack.overflow/macy.greytold/;s/stack2.overflow2/prince.caballero/;s/stack2.overflow2/prince.caballero/' input_file
Col1 Col2 Col3 Col4 Col5 Col6 Col7
c1 f2 macy.greytold macy.greytold f5 stack.overflow stack.overflow
d1 g2 prince.caballero prince.caballero xyx stack2.overflow2 stack2.overflow2