Home > OS >  Add quotes to delimited fields
Add quotes to delimited fields

Time:08-17

I have a very big file contain some mix data - total 5 field delimited with pipe. See two sample records here:

"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#1|Tokyo|Japan
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#2|Tokyo|Japan

I want to quote all the fields from 3 onwards.

The output I am looking for will be:

"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"

Can someone guide me either sed or awk to update a 2gb data file quick?

Thanks

CodePudding user response:

I was waiting til you provided your attempt in your question but since you have 2 answers already... using any awk:

awk 'BEGIN{FS=OFS="|"} {for (i=3; i<=NF; i  ) $i="\"" $i "\""} 1' file

The above assumes you can't have a | inside a quoted field.

CodePudding user response:

Using sed

$ sed 's/|\([^|]*\)/|"\1"/2g' input_file
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"

"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"

CodePudding user response:

Instead of sed or awk, consider while read.

#!/bin/bash
while IFS='|' read -ra line ; do
  printf '%s|%s|"%s"|"%s"|"%s"\n' "${line[@]}"
done < file.txt > file.txt.new
# mv file.txt.new file.txt # uncomment to replace the original file with the modified one 

CodePudding user response:

Using the following awk one-liner might work for you:

awk -F '|' 'NF {printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF}' your_source_file

Split lines on pipe character: -F '|' Skip empty lines: NF Output in desired format: printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF

Sample source file:

"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#1|Tokyo|Japan

"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|SomeAddress#2|Tokyo|Japan

Sample output:

$ awk -F '|' 'NF {printf "%s|%s|\"%s\"|\"%s\"|\"%s\"\n", $1, $2, $(NF-2), $(NF-1), $NF}' s.txt
"1"|"{"Address": "SomeAddress#1", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#1"|"Tokyo"|"Japan"
"2"|"{"Address": "SomeAddress#2", "City": "Tokyo", "Country": "Japan"}"|"SomeAddress#2"|"Tokyo"|"Japan"
  • Related