I know this question is already answered but with comma as a separator. How to make awk ignore the field delimiter inside double quotes?
But My file is separated by pipe, when I use this in regex it act as a regex only and not getting proper output. I do not use awk extensively.. my requirement is add single slash before pipe character if it is coming in value.
As file size is almost 5GB, thought to select particular column and escaped the pipe.
INPUT:
"first | last | name" |" steve | white | black"| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
Expected Output:
"first \| last \| name" |" steve \| white \| black "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
I tried to use gawk with gsub but no luck.. is there any alternate approach for the same?
Also if I have to check in multiple columns how I can do that?
CodePudding user response:
Assumptions:
- can have more than one field with embedded
|
character (said field will be wrapped in double quotes) - there may be more than one embedded
|
character in a single field - double quotes do not show up as embedded characters within other double quotes
Setup:
$ cat pipe.dat
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022 # multiple double-quoted fields, multiple pipes between double quotes
cars | camaro | chevy | 2033 # no double quotes
NOTE: comments added here to highlight new cases
One awk
idea:
awk '
BEGIN { FS=OFS="\"" } # define field delimiters as double quote
{ for (i=2;i<=NF;i =2) # double quoted data resides in the even numbered fields
gsub(/\|/,"\\|",$i) # escape all pipe characters in field #i
print
}
' pipe.dat
This generates:
name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe \| one"|"pipe \| two and \| three"| 2022
cars | camaro | chevy | 2033
Assuming no spaces between the |
delimiter and double quotes ...
One GNU awk
idea (using the FPAT
feature):
awk -v FPAT='([^|]*)|("[^"] ")' '
BEGIN { OFS="|" }
{ for (i=1;i<=NF;i )
gsub(/\|/,"\\|",$i)
print
}
' pipe.dat
This also generates:
name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe \| one"|"pipe \| two and \| three"| 2022
cars | camaro | chevy | 2033
CodePudding user response:
Using awk
$ awk 'BEGIN{FS=OFS="\""} {sub(/\|/,"\\|",$2)}1' input_file
name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
Using sed
(if applicable)
$ sed -E 's/("[^|]*)(\|[^"]*")/\1\\\2/' input_file
name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019