Home > Blockchain >  Editing a poorly formatted CSV file
Editing a poorly formatted CSV file

Time:07-19

I have this poorly formatted csv file that was converted from a PDF. After some editing i got to this.

I-10, New, "BRACELET JEWELRY 11/28/03, 14KT, LDS SS, STYLE: BANGLE", 1.00, 125.00, 1000.00,
I-11, Old, "BRACELET JEWELRY 11/28/03, 14KT; AMT / PER, LDS SS, STYLE:", 1.00, 158.00, 1264.00,
I-12, New, "BRACELET JEWELRY 11/28/03, 14KT; CITRINE, LDS SS, STYLE:", 1.00, 124.00, 992.00,
I-13, New, "BRACELET JEWELRY 11/28/03, 14KT; PERIDOT, LDS SS, STYLE:", 1.00, 173.00, 1384.00,
I-14, New, "BRACELET JEWELRY 11/28/03, 14KT; CITRINE, LDS SS, STYLE:", 1.00, 155.00, 1240.00,
I-15, New, "BRACELET JEWELRY 11/28/03, 14KT; GARNET/CITRINE, LDS SS,", 1.00, 168.00, 1344.00,
I-16, New, "BRACELET JEWELRY 11/28/03, 14KT; AMETH", 1.00, 142.00, 1136.00,
I-19, New, "WEDDING BAND JEWELRY 04/12/94, 14KT; 7 Prin channel .75ctw,", 1.00, 563.00, 2252.00,
I-22, New, "WEDDING BAND JEWELRY 08/14/88, 14KT; 3 ROLLS DIA .88CTW,", 1.00, 528.00, 2112.00,
I-23, New, "SEMI-MOUNT JEWELRY 12/07/92, 14KT; 26 RBC .75CTW, LDS YG", 1.00, 437.50, 1750.00,
I-24, New, "WEDDING BAND JEWELRY 10/14/98, 14KT; 21 PRIN 1.00CTW,", 1.00, 490.00, 2799.00,
I-25, New, "BRIDAL SET JEWELRY 09/09/99, 14KT; 1 OVAL .40CT 10RBC", 1.00, 500.00, 3100.00,

The commas within the quotes causes extra columns to be created. So I am trying to make a script to pick the section of the line that is within quotes, remove to commas, and then create and save to a new file.

This is what I have come up with so far, but I haven't used python before.

import re
import pandas as pd
df = pd.read_csv("./INVLISTcopy.csv", sep=',', on_bad_lines='skip')
regex1 = r"([\"\"])(?:(?=(\\?))\2.)*?\1"
srh = re.findall(regex1, df, re.DOTALL)
re.sub(',' ,' ', srh)
df.to_csv('INVLISTcopy1.csv')

CodePudding user response:

You can use CSV Output

  • Related