Home > Enterprise >  how to create a new column based on string from different columns
how to create a new column based on string from different columns

Time:01-03

I have a dataframe look like this:

type    city
1      dki jakarta
2      jawa barat
3      jawa tengah
4      jawa timur
5      sulawesi

I want to create a new column called city_group based on the city.

  • dki jakarta, jawa barat: jabo, jabar
  • jawa tengah, jawa tengah: jateng, jatim
  • sulawesi: others

The desire dataframe would be like this:

type    city            city_group
1      dki jakarta      jabo, jabar
2      jawa barat       jabo, jabar
3      jawa tengah      jateng, jatim
4      jawa timur       jateng, jatim
5      sulawesi         others

So far, what I have done is with this script below but I did not get how to put multiple string in the condition.

df.loc[df['city'].str.contains("dki jakarta),'city_group'] = 'jabo, jabar'

How can I get the desired dataframe with pandas? Thank you in advance

CodePudding user response:

Create dictionary of lists for mapping by keys and in dict comprehenion flatten it, so possible use Series.map with Series.fillna - all values not matched in dictionary create others values in column city_group:

d = {'jabo, jabar':['dki jakarta','jawa barat'],
     'jateng, jatim':['jawa tengah','jawa timur']}

d1 = {x: k for k, v in d.items() for x in v}
df['city_group'] = df['city'].map(d1).fillna('others')
print (df)
   type         city     city_group
0     1  dki jakarta    jabo, jabar
1     2   jawa barat    jabo, jabar
2     3  jawa tengah  jateng, jatim
3     4   jawa timur  jateng, jatim
4     5     sulawesi         others
  • Related