I have a dataframe look like this:
type city
1 dki jakarta
2 jawa barat
3 jawa tengah
4 jawa timur
5 sulawesi
I want to create a new column called city_group
based on the city.
- dki jakarta, jawa barat: jabo, jabar
- jawa tengah, jawa tengah: jateng, jatim
- sulawesi: others
The desire dataframe would be like this:
type city city_group
1 dki jakarta jabo, jabar
2 jawa barat jabo, jabar
3 jawa tengah jateng, jatim
4 jawa timur jateng, jatim
5 sulawesi others
So far, what I have done is with this script below but I did not get how to put multiple string in the condition.
df.loc[df['city'].str.contains("dki jakarta),'city_group'] = 'jabo, jabar'
How can I get the desired dataframe with pandas? Thank you in advance
CodePudding user response:
Create dictionary of lists for mapping by keys and in dict comprehenion flatten it, so possible use Series.map
with Series.fillna
- all values not matched in dictionary create others
values in column city_group
:
d = {'jabo, jabar':['dki jakarta','jawa barat'],
'jateng, jatim':['jawa tengah','jawa timur']}
d1 = {x: k for k, v in d.items() for x in v}
df['city_group'] = df['city'].map(d1).fillna('others')
print (df)
type city city_group
0 1 dki jakarta jabo, jabar
1 2 jawa barat jabo, jabar
2 3 jawa tengah jateng, jatim
3 4 jawa timur jateng, jatim
4 5 sulawesi others