Home > database >  Python Pandas: Is there a way to obtain a subset dataframe based on strings in a list
Python Pandas: Is there a way to obtain a subset dataframe based on strings in a list

Time:12-11

I am looking to make a subset df based on the string values in a list.

A toy model example:

data = {'month': ['January','February','March','April','May','June','July','August','September','October','November','December'],
        'days_in_month': [31,28,31,30,31,30,31,31,30,31,30,31]
        }

df = pd.DataFrame(data, columns = ['month', 'days_in_month'])

summer_months = ['Dec', 'Jan', 'Feb']

contain_values = df[df['month'].str.contains(summer_months)] 
print (df)

This would fail because of contain_values = df[df['month'].str.contains(summer_months)]

TypeError: unhashable type: 'list'

I know that contain_values = df[df['month'].str.contains('Dec')] works but I would like to return the new dataframe with the summer months in it. Or even all the none summer months using the ~ function.

Thanks

CodePudding user response:

>>> contain_values = df[df['month'].str.contains('|'.join(summer_months))]

>>> contain_values
       month  days_in_month
0    January             31
1   February             28
11  December             31

CodePudding user response:

You can as well using what .str offers you:

df[df["month"].str[:3].isin(summer_months)]

OUTPUT

       month  days_in_month
0    January             31
1   February             28
11  December             31

You can make it more robust using something like this (in case names in the dataframe are not properly capitalized):

df[df["month"].str.capitalize().str[:3]]
  • Related