read file text pandas skiping some space-CodePudding

Assuming a following dataframe. I wanted to read this csv file separating the fields with a space.

Name Age City
jack 34   Sydeny
Riti 31  Delhi
Aadi 16 New York
Suse 32   Lucknow
Mark  33 Las vegas
Suri  35 Patna

I try this bus it doesn't work because i have space in New York, Las Vegas:

df = pd.read_csv ( 'users_5.csv' , sep= '\s ' , engine= 'python' )

Output expected:

Contents of Dataframe : 
   Name  Age       City
0  jack   34     Sydeny
1  Riti   31      Delhi
2  Aadi   16   New York
3  Suse   32    Lucknow
4  Mark   33  Las vegas
5  Suri   35      Patna

CodePudding user response：

In your case I'm afraid you will either have to regenerate a proper csv with either quoting all text fields and keeping the space as a separator or changing the separator char.

e.g.

"Name" "Age" "City"
"jack" 34   "Sydeny"
"Riti" 31  "Delhi"
"Aadi" 16 "New York"
"Suse" 32   "Lucknow"
"Mark"  33 "Las vegas"
"Suri"  35 "Patna"

Last option if you can't rebuild the csv, is to process each line individually to build your dataframe (like if you know for sure that the first column is a single word, the second a number then you can take everything else for the last column).

I would do something like this

import re
import pandas as pd

data = {
  'name': [],
  'age': [],
  'city': []
}


with open('yourfile.csv', 'r') as f:
    next(f)  # skip the header
    for line in f:
        name, age, *city = re.split('\s ', line.strip())
        data['name'].append(name)
        data['age'].append(age)
        data['city'].append(' '.join(city))

df = pd.DataFrame.from_dict(data)