Home > front end >  Replacing Double Quotes of Specific Column Entries in CSV File?
Replacing Double Quotes of Specific Column Entries in CSV File?

Time:07-09

I have a huge CSV file with sample data that looks like so:

"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"Gas";""USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""

The original CVS file had some unnecessary characters that I removed to obtain the data as shown above using the following code:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp

text = open("report.csv", "r")
text = ''.join([i for i in text]) \
      .replace('old', 'new')
x = open("report_mod.csv","w")
x.writelines(text)

x.close()

Where I'm stuck now is, how do I replace the double quotes ("") with single quotes (") for all the entries of the field column Payee?

In the above example, the 3 entries for the Payee is ""Carfax", ""Carfax", and ""USA Petroleum". I would like to replace the double quotes at the beginning with single quotes, i.e. "Carfax", "Carfax", and "USA Petroleum"

The new CSV file should look like so:

"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"Gas";"USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""

Sample data file: report.csv

CodePudding user response:

You can use regex maybe

import re
text = re.sub('^""$', '"' ,text)

so full code must be like this:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import re

# reading CSV file
data = pd.read_csv("report.csv",delimiter=';')

for val in data['Payee']:
    val = str (val)
    newVal = re.sub(r'"', '' ,val)
    newVal = '"' newVal '"'
    print(newVal)

the output is this on my terminal:

"nan"
"Carfax"
"Carfax"
"USA Petroleum"

Edit: Add full code to create the file

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import re

# reading CSV file
data = pd.read_csv("report.csv",delimiter=';')
names           = data['Name'].tolist();
balances        = data['Current balance'].tolist();
accounts        = data['Account'].tolist();
transfers       = data['Transfers'].tolist();
descriptions    = data['Description'].tolist();
categories      = data['Category'].tolist();
dates           = data['Date'].tolist();
memos           = data['Memo'].tolist();
amount          = data['Amount'].tolist();
currency        = data['Currency'].tolist();
check           = data['Check #'].tolist();
tags            = data['Tags'].tolist();

counter = 0
f = open("report_modified.csv", "w ")
f.write('"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"\n');
for val in data['Payee']:
    val = str (val)
    newVal = re.sub(r'"', '' ,val)
    newVal = '"' newVal '"'
    print(newVal)
    f.write(str(names[counter]) ';')
    f.write(str(balances[counter]) ';')
    f.write(str(accounts[counter]) ';')
    f.write(str(transfers[counter]) ';')
    f.write(str(descriptions[counter]) ';')
    f.write(str(newVal) ';')
    f.write(str(categories[counter]) ';')
    f.write(str(dates[counter]) ';')
    f.write(str(memos[counter]) ';')
    f.write(str(amount[counter]) ';')
    f.write(str(currency[counter]) ';')
    f.write(str(check[counter]) ';')
    f.write(str(tags[counter]) '\n')
f.close()
  • Related