Assuming a following dataframe. I wanted to read this csv file separating the fields with a space.
Name Age City
jack 34 Sydeny
Riti 31 Delhi
Aadi 16 New York
Suse 32 Lucknow
Mark 33 Las vegas
Suri 35 Patna
I try this bus it doesn't work because i have space in New York, Las Vegas:
df = pd.read_csv ( 'users_5.csv' , sep= '\s ' , engine= 'python' )
Output expected:
Contents of Dataframe :
Name Age City
0 jack 34 Sydeny
1 Riti 31 Delhi
2 Aadi 16 New York
3 Suse 32 Lucknow
4 Mark 33 Las vegas
5 Suri 35 Patna
CodePudding user response:
In your case I'm afraid you will either have to regenerate a proper csv with either quoting all text fields and keeping the space as a separator or changing the separator char.
e.g.
"Name" "Age" "City"
"jack" 34 "Sydeny"
"Riti" 31 "Delhi"
"Aadi" 16 "New York"
"Suse" 32 "Lucknow"
"Mark" 33 "Las vegas"
"Suri" 35 "Patna"
Last option if you can't rebuild the csv, is to process each line individually to build your dataframe (like if you know for sure that the first column is a single word, the second a number then you can take everything else for the last column).
I would do something like this
import re
import pandas as pd
data = {
'name': [],
'age': [],
'city': []
}
with open('yourfile.csv', 'r') as f:
next(f) # skip the header
for line in f:
name, age, *city = re.split('\s ', line.strip())
data['name'].append(name)
data['age'].append(age)
data['city'].append(' '.join(city))
df = pd.DataFrame.from_dict(data)