I have a huge CSV file with sample data that looks like so:
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"Gas";""USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
The original CVS file had some unnecessary characters that I removed to obtain the data as shown above using the following code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
text = open("report.csv", "r")
text = ''.join([i for i in text]) \
.replace('old', 'new')
x = open("report_mod.csv","w")
x.writelines(text)
x.close()
Where I'm stuck now is, how do I replace the double quotes ("") with single quotes (") for all the entries of the field column Payee
?
In the above example, the 3 entries for the Payee
is ""Carfax"
, ""Carfax"
, and ""USA Petroleum"
. I would like to replace the double quotes at the beginning with single quotes, i.e. "Carfax"
, "Carfax"
, and "USA Petroleum"
The new CSV file should look like so:
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
"";"";"Capital One Quicksilver";"";"Gas";"USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
Sample data file: report.csv
CodePudding user response:
You can use regex maybe
import re
text = re.sub('^""$', '"' ,text)
so full code must be like this:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import re
# reading CSV file
data = pd.read_csv("report.csv",delimiter=';')
for val in data['Payee']:
val = str (val)
newVal = re.sub(r'"', '' ,val)
newVal = '"' newVal '"'
print(newVal)
the output is this on my terminal:
"nan"
"Carfax"
"Carfax"
"USA Petroleum"
Edit: Add full code to create the file
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import re
# reading CSV file
data = pd.read_csv("report.csv",delimiter=';')
names = data['Name'].tolist();
balances = data['Current balance'].tolist();
accounts = data['Account'].tolist();
transfers = data['Transfers'].tolist();
descriptions = data['Description'].tolist();
categories = data['Category'].tolist();
dates = data['Date'].tolist();
memos = data['Memo'].tolist();
amount = data['Amount'].tolist();
currency = data['Currency'].tolist();
check = data['Check #'].tolist();
tags = data['Tags'].tolist();
counter = 0
f = open("report_modified.csv", "w ")
f.write('"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"\n');
for val in data['Payee']:
val = str (val)
newVal = re.sub(r'"', '' ,val)
newVal = '"' newVal '"'
print(newVal)
f.write(str(names[counter]) ';')
f.write(str(balances[counter]) ';')
f.write(str(accounts[counter]) ';')
f.write(str(transfers[counter]) ';')
f.write(str(descriptions[counter]) ';')
f.write(str(newVal) ';')
f.write(str(categories[counter]) ';')
f.write(str(dates[counter]) ';')
f.write(str(memos[counter]) ';')
f.write(str(amount[counter]) ';')
f.write(str(currency[counter]) ';')
f.write(str(check[counter]) ';')
f.write(str(tags[counter]) '\n')
f.close()