I have a very big file contain some mix data - total 5 field delimited with pipe. See two sample records here:
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#1|Tokyo|Japan
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#2|Tokyo|Japan
I want to quote all the fields from 3 onwards.
The output I am looking for will be:
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"
Can someone guide me either sed or awk to update a 2gb data file quick?
Thanks
CodePudding user response:
I was waiting til you provided your attempt in your question but since you have 2 answers already... using any awk:
awk 'BEGIN{FS=OFS="|"} {for (i=3; i<=NF; i ) $i="\"" $i "\""} 1' file
The above assumes you can't have a |
inside a quoted field.
CodePudding user response:
Using sed
$ sed 's/|\([^|]*\)/|"\1"/2g' input_file
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"
CodePudding user response:
Instead of sed
or awk
, consider while read
.
#!/bin/bash
while IFS='|' read -ra line ; do
printf '%s|%s|"%s"|"%s"|"%s"\n' "${line[@]}"
done < file.txt > file.txt.new
# mv file.txt.new file.txt # uncomment to replace the original file with the modified one
CodePudding user response:
Using the following awk
one-liner might work for you:
awk -F '|' 'NF {printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF}' your_source_file
Split lines on pipe character:
-F '|'
Skip empty lines:
NF
Output in desired format:
printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF
Sample source file:
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#1|Tokyo|Japan
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#2|Tokyo|Japan
Sample output:
$ awk -F '|' 'NF {printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF}' s.txt
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"