Home > front end >  From CSV (Double quote enclosed) file how to remove data double quotes with unix
From CSV (Double quote enclosed) file how to remove data double quotes with unix

Time:02-01

From CSV (Double quote enclosed) file how to REPLACE data double quotes (means double quote inside data) from 3rd html string field (it has data commas also i.e commas inside data) with ~ symbol with unix. We should not remove any other double quotes.

Input File:

F1,F2,F3
"11111","ABABDBDA","<div style="text-aaa: justify;"> Il MMM delinea l&rsquo;evvv dei ccccc e dei ruorrrrli, degli organi sss, alaaaala "
"22222","PPPPPPPP","<p style="text-align: justify;"> <span style="color:#ff0000;"><strong>Disponibile dal 25/03</strong></span></p> <div style="text-align: justify;"> Il manuale delinea l&rsquo;evoluzione dei , ;"> </div>"
"333333","QQQQQQQ","<p style="text-align: justify;"> Il libro analizza i singoli cicli gestionali, partendo dalle rilevazioni, contabili per giungere poi alla destinazione di "

Expected output file:

F1,F2,F3
"11111","ABABDBDA","<div style=~text-aaa: justify;> Il MMM delinea l&rsquo;evvv dei ccccc e dei ruorrrrli, degli organi sss, alaaaala" 
"22222","PPPPPPPP","<p style=~text-align: justify;~> <span style=~color:#ff0000;~><strong>Disponibile dal 25/03</strong></span></p> <div style=~text-align: justify;~> Il manual"
"333333","QQQQQQQ","<p style=~text-align: justify;~> Il libro analizza i singoli cicli gestionali, partendo dalle rilevazioni, contabili per giungere poi alla destinazione"

I tried with awk command but its not giving expected output

CodePudding user response:

If you don't find a solution with awk, here is a very simple Go program that uses the standard library's CSV parser, and its LazyQuotes option (emphasis added for your case):

If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

From your original statement and the sample input, it looks to me like the only issue is that quotes for the HTML attributes were not properly escaped when the CSV was created. I think the LazyQuotes option can handle this correctly, and this seems to me like it would be more desireable than changing the HTML:

F1,F2,F3
11111,ABABDBDA,"<div style=""text-aaa: justify;""> Il MMM delinea l&rsquo;evvv dei ccccc e dei ruorrrrli, degli organi sss, alaaaala "
22222,PPPPPPPP,"<p style=""text-align: justify;""> <span style=""color:#ff0000;""><strong>Disponibile dal 25/03</strong></span></p> <div style=""text-align: justify;""> Il manuale delinea l&rsquo;evoluzione dei , ;""> </div>"
333333,QQQQQQQ,"<p style=""text-align: justify;""> Il libro analizza i singoli cicli gestionali, partendo dalle rilevazioni, contabili per giungere poi alla destinazione di "

and in table-view:

F1 F2 F3
11111 ABABDBDA <div style="text-aaa: justify;"> Il MMM delinea l’evvv dei ccccc e dei ruorrrrli, degli org...
22222 PPPPPPPP <p style="text-align: justify;"> <span style="color:#ff0000;"><strong>Disponibile dal 25/03</stro...
333333 QQQQQQQ <p style="text-align: justify;"> Il libro analizza i singoli cicli gestionali, partendo dalle ril...

Here's a smaller sample to try and highlight:

  1. creating the CSV reader
  2. setting the LazyQuotes option
  3. reading all records
  4. creating the CSV writer
  5. writing all records

No special processing is needed, the CSV parser just does the right thing for you:

csvBlob := `F1,F2,F3
"11111","ABABDBDA","<div style="text-aaa: justify;"> Il MMM delinea l&rsquo;evvv dei ccccc e dei ruorrrrli, degli organi sss, alaaaala "
"22222","PPPPPPPP","<p style="text-align: justify;"> <span style="color:#ff0000;"><strong>Disponibile dal 25/03</strong></span></p> <div style="text-align: justify;"> Il manuale delinea l&rsquo;evoluzione dei , ;"> </div>"
"333333","QQQQQQQ","<p style="text-align: justify;"> Il libro analizza i singoli cicli gestionali, partendo dalle rilevazioni, contabili per giungere poi alla destinazione di "
`

fIn := strings.NewReader(csvBlob)
r := csv.NewReader(fIn)
r.LazyQuotes = true

records, _ := r.ReadAll()

w := csv.NewWriter(os.Stdout)
w.WriteAll(records)
w.Flush()

You can view that in the Go Playground and run it.

Here's a more complete example to read from a file and write to a new file, with error checking:

fIn, err := os.Open("input.csv")
defer fIn.Close()
if err != nil {
  log.Fatal("could not open input.csv:", err)
}

r := csv.NewReader(fIn)
r.LazyQuotes = true

records, err := r.ReadAll()
if err != nil {
  log.Fatal("could not read CSV:", err)
}

fOut, err := os.Create("output.csv")
defer fOut.Close()
if err != nil {
  log.Fatal("could not create output.csv:", err)
}

w := csv.NewWriter(fOut)
w.WriteAll(records)

CodePudding user response:

Using Miller 6, and running

mlr --csv --lazy-quotes put '$F3=gsub($F3,"\"","")' input.csv >output.csv

you get

F1,F2,F3
11111,ABABDBDA,"<div style=text-aaa: justify;> Il MMM delinea l&rsquo;evvv dei ccccc e dei ruorrrrli, degli organi sss, alaaaala "
22222,PPPPPPPP,"<p style=text-align: justify;> <span style=color:#ff0000;><strong>Disponibile dal 25/03</strong></span></p> <div style=text-align: justify;> Il manuale delinea l&rsquo;evoluzione dei , ;> </div>"
333333,QQQQQQQ,"<p style=text-align: justify;> Il libro analizza i singoli cicli gestionali, partendo dalle rilevazioni, contabili per giungere poi alla destinazione di "

Some notes:

  • --lazy-quotes, accepts quotes appearing in unquoted fields, and non-doubled quotes appearing in quoted fields
  • put '$F3=gsub($F3,"\"","")' to search " and replace it with nothing.
  • Related