Home > Enterprise >  Replace pipe with comma except between curly braces in CSV in bash
Replace pipe with comma except between curly braces in CSV in bash

Time:12-21

Need some solution to replace pipe with comma in specific column of CSV file, which is also having some key value as pipe separated strings (could be any in number, one or more).

Basically need to replace pipe which is not within curly braces i.e.{subStringX441|subStringX442|subStringX443|subStringX444} should remain untouched.

Can't use simple sed -i -e 's\|\,\g' filename as it will replace all pipes.

Input:

column1,column2,column3,column4,column5,column6,column7

stringX1,stringX2,stringX3,stringX41|stringX42|stringX43|stringX44={subStringX441|subStringX442|subStringX443|subStringX444}|stringX45,stringX5,stringX6,stringX7

stringY1,stringY2,stringY3,stringY41|stringY42|stringY43|stringY44={subStringY441|subStringY442|subStringY443}|stringY45,stringY5,stringY6,stringY7

Desired Output:

column1,column2,column3,column4a,column4b,column4c,column4d,column4e,column5,column6,column7

stringX1,stringX2,stringX3,stringX41,stringX42,stringX43,stringX44={subStringX441|subStringX442|subStringX443|subStringX444},stringX45,stringX5,stringX6,stringX7

stringY1,stringY2,stringY3,stringY41,stringY42,stringY43,stringY44={subStringY441|subStringY442|subStringY443},stringY45,stringY5,stringY6,stringY7

CodePudding user response:

Using sed

$ sed 's/\({[^}]*\)\||/,\1/g;s/,{/{/;1s/column4/&a,&b,&c,&d,&e/' input_file
column1,column2,column3,column4a,column4b,column4c,column4d,column4e,column5,column6,column7

stringX1,stringX2,stringX3,stringX41,stringX42,stringX43,stringX44={subStringX441|subStringX442|subStringX443|subStringX444},stringX45,stringX5,stringX6,stringX7

stringY1,stringY2,stringY3,stringY41,stringY42,stringY43,stringY44={subStringY441|subStringY442|subStringY443},stringY45,stringY5,stringY6,stringY7

CodePudding user response:

Regular expressions (in strict sense) are not enough for dealing with balanced bracket (last imply at least Chomsky Type-2). I would use GNU AWK for this task following way, let file.txt content be

stringY1,stringY2,stringY3,stringY41|stringY42|stringY43|stringY44
{subStringY441|subStringY442|subStringY443}|stringY45,stringY5,stringY6,stringY7

then

awk 'BEGIN{FPAT=".";OFS=""}{for(i=1;i<=NF;i =1){if($i=="{"){inside=1};if($i=="}"){inside=0};if(!inside && $i=="|"){$i=","}};print}' file.txt

output

stringY1,stringY2,stringY3,stringY41,stringY42,stringY43,stringY44
{subStringY441|subStringY442|subStringY443},stringY45,stringY5,stringY6,stringY7

Explanation: I inform GNU AWK that any single character is to be treated as field using FPAT variable and output field seperator is empty string using OFS variable. For every line I go through subsequent fields (i.e. characters) using for loop, if character is { then I set variable inside to 1, if character is } then I set variable to 0, then if we are not (!) inside and (&&) character is | change it to ,. After processing all characters in line I print.

DISCLAIMER this solution assumes that curly brackets are never nested and every { has matching } in given line.

(tested in gawk 4.2.1)

  • Related