I am working on this subset of data in the column name df['Location']:
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Egypt
Egypt
Egypt
Cairo, Cairo, Egypt
Benha, Al Qalyubiyah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Giza, Al Jizah, Egypt
6th of October, Cairo, Egypt
Giza, Al Jizah, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Qesm El Nozha, Cairo, Egypt
Giza, Al Jizah, Egypt
Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Egypt
New Cairo, Cairo, Egypt
6th of October, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Markaz Banha, Al Qalyubiyah, Egypt
Qesm El Maadi, Cairo, Egypt
Egypt
Giza, Al Jizah, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Qesm El Maadi, Cairo, Egypt
10th of Ramadan, Sharkia, Egypt
Giza, Al Jizah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Cairo, Egypt
Egypt
Egypt
Egypt
Cairo, Cairo, Egypt
Benha, Al Qalyubiyah, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
New Cairo, Cairo, Egypt
Alexandria, Alexandria, Egypt
Cairo, Egypt
Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
New Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
Cairo, Cairo, Egypt
I would like to split this data into City and Governate columns, where City is the far left entry and I want to say that if it has less than 3 entries or that the City entry is = 'Cairo'
(which is a governate not a city, I would like to say that unspecified city in 'Governate Name' which is the second entry! also, if only had 1 entry I would like to say that the governate is not specified
P.S the output is formatted as (city,governate,country)
CodePudding user response:
Apply the function that splits the strings:
def split_fun(x):
lst = x.split(', ')
gov = ''
city = ''
country = lst[-1]
if len(lst) > 1: gov = lst[-2]
if len(lst) == 3: city = lst[0]
return pd.Series([city, gov, country])
df[['City', 'Gov', 'Country']] = df['Location'].apply(split_fun)
Location City Gov Country
0 Cairo, Cairo, Egypt Cairo Cairo Egypt
1 Cairo, Cairo, Egypt Cairo Cairo Egypt
2 Cairo, Egypt Cairo Egypt
3 Cairo, Cairo, Egypt Cairo Cairo Egypt
4 Cairo, Egypt Cairo Egypt
.. ... ... ... ...
94 Cairo, Cairo, Egypt Cairo Cairo Egypt
95 New Cairo, Cairo, Egypt New Cairo Cairo Egypt
96 New Cairo, Cairo, Egypt New Cairo Cairo Egypt
97 Cairo, Cairo, Egypt Cairo Cairo Egypt
98 Cairo, Cairo, Egypt Cairo Cairo Egypt