I have the following test2.csv
file:
id,name,city,country
1,David, Johnson,London,UK,
2,Mary, Gueta,NY,USA,
3,Matthias, Nile,Munich,Germany,
I want to read it into a pandas dataframe. Using this code
df = pd.read_csv('test2.csv')
I get the following df:
But I want to store the first name and last name together in the column name
. id
column should store the numbers. city
and country
should also store the appropriate values.
Thank you in advance!
CodePudding user response:
import csv
file = open(`temp.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
new_rows = []
for row in csvreader:
new_rows.append([row[0], row[1] ',' row[2], row[3], row[4]])
file.close()
df = pd.DataFrame(new_rows, columns =header )
df
CodePudding user response:
There is a useful pattern to distinguish between values, space appears after comma in names, so you can substitute delimiter "," with "|" that is more efficient to avoid this type of confusion
import csv
import pandas as pd
import re
#substitute comma not followed by space for vertical bar in the original file and save it as a new file
with open('test2.csv', 'r') as infile, open('test2_vb.csv', 'w') as outfile:
for line in infile:
line = re.sub(',(?! )', '|', line)
outfile.write(line)
file = open('test2_vb.csv', 'r')
reader = csv.reader(file , delimiter='|')
#load the data into a dataframe
df = pd.DataFrame(reader)
print(df)
this way using a regular expression the file can be corrected (though at the end of a line should not be separator)
CodePudding user response:
You can change 1,David, Johnson,London,UK
to 1,"David, Johnson",London,UK
then load it using pd.DataFrame