Home > Net >  search and replace a csv file
search and replace a csv file

Time:12-02

I've been working on a csv file.

I want to replace the the 3rd and 4th columns only.

Here is an example:

Col1  Col2  Col3              Col4             Col5   Col6               Col7
c1    f2    stack.overflow    stack.overflow   f5     stack.overflow     stack.overflow
d1    g2    stack2.overflow2  stack2.overflow2 xyx    stack2.overflow2   stack2.overflow2

I carry out the search and replace of Col3 and Col4 only.

What I get is:

Col1  Col2  Col3              Col4             Col5   Col6               Col7
c1    f2    macy.greytold     macy.greytold    f5     stack.overflow     stack.overflow
d1    g2    prince.caballero  prince.caballero xyx    stack2.overflow2   stack2.overflow2

I used sed:

 sed ' s!stack.overflow2!prince.caballero!g' filename.csv > test1.csv
 sed ' s!stack.overflow!macy.greytold!g' test1.csv > test-2.csv

But that changes the whole lot. Is there a way to just change fields 3 and 4?

Please note that I have given an example of 2 lines. This can run on to 100 lines.

CodePudding user response:

Assuming your file is a TSV file (tab-separated values) this is easy with Awk.

awk -F '\t' '$3 == "stack.overflow2" { $3 = "prince.caballero" }
    $4 == "stack.overflow2" { $4 = "prince.caballero" }
    $3 == "stack.overflow" { $3 = "macy.greytold" }
    $4 == "stack.overflow" { $4 = "macy.greytold" }
    1' filename.csv >updated.tsv

Awk processes one line (or, more generally, one record; you can change the record separator with RS=value) at a time. The variables $1, $2 etc get the values of the respective fields on the line. The == comparison simply checks for verbatim equality (please update the question if you want to match substrings of these fields, or e.g. case-insensitive matching) and the final 1 says to print every line.

In the general case, CSV and its variations allows for quoting of fields, which means not all records are a single line and not all field separators actually separate two fields, at which point using simple line-oriented tools like Awk may no longer be feasible. If you bump into these complications, look for a dedicated CSV processing tool, or switch to a language which comes with a proper CSV parser -- Python is one popular choice.

CodePudding user response:

You ae using the global g flag which means substitute all matches. You could filter it down by targeting the occurrence sed ' s!stack.overflow2!prince.caballero!2' for the second etc.

You could also just simply do the substitution twice for each match.

Using sed

$ sed 's/stack.overflow/macy.greytold/;s/stack.overflow/macy.greytold/;s/stack2.overflow2/prince.caballero/;s/stack2.overflow2/prince.caballero/' input_file 
Col1  Col2  Col3              Col4             Col5   Col6               Col7
c1    f2    macy.greytold    macy.greytold   f5     stack.overflow     stack.overflow
d1    g2    prince.caballero  prince.caballero xyx    stack2.overflow2   stack2.overflow2
  • Related