Home > Software design >  Splitting Location Pandas column into city and state and inserting Not specified if it is not
Splitting Location Pandas column into city and state and inserting Not specified if it is not

Time:09-21

I am working on this subset of data in the column name df['Location']:

Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Egypt
Egypt
Egypt
Cairo, Cairo, Egypt
Benha, Al Qalyubiyah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Giza, Al Jizah, Egypt
6th of October, Cairo, Egypt
Giza, Al Jizah, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Qesm El Nozha, Cairo, Egypt
Giza, Al Jizah, Egypt
Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Egypt
New Cairo, Cairo, Egypt
6th of October, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Markaz Banha, Al Qalyubiyah, Egypt
Qesm El Maadi, Cairo, Egypt
Egypt
Giza, Al Jizah, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Qesm El Maadi, Cairo, Egypt
10th of Ramadan, Sharkia, Egypt
Giza, Al Jizah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Egypt
Egypt
Egypt
Cairo, Cairo, Egypt
Benha, Al Qalyubiyah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt

I would like to split this data into City and Governate columns, where City is the far left entry and I want to say that if it has less than 3 entries or that the City entry is = 'Cairo' (which is a governate not a city, I would like to say that unspecified city in 'Governate Name' which is the second entry! also, if only had 1 entry I would like to say that the governate is not specified

P.S the output is formatted as (city,governate,country)

CodePudding user response:

Apply the function that splits the strings:

def split_fun(x): 
    lst = x.split(', ') 
    gov = '' 
    city = '' 
    country = lst[-1] 
    if len(lst) > 1: gov = lst[-2] 
    if len(lst) == 3: city = lst[0] 
    return pd.Series([city, gov, country]) 

df[['City', 'Gov', 'Country']] = df['Location'].apply(split_fun)

                   Location       City    Gov Country
0       Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
1       Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
2              Cairo, Egypt             Cairo   Egypt
3       Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
4              Cairo, Egypt             Cairo   Egypt
..                      ...        ...    ...     ...
94      Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
95  New Cairo, Cairo, Egypt  New Cairo  Cairo   Egypt
96  New Cairo, Cairo, Egypt  New Cairo  Cairo   Egypt
97      Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
98      Cairo, Cairo, Egypt      Cairo  Cairo   Egypt
  • Related