I am looking to make a subset df based on the string values in a list.
A toy model example:
data = {'month': ['January','February','March','April','May','June','July','August','September','October','November','December'],
'days_in_month': [31,28,31,30,31,30,31,31,30,31,30,31]
}
df = pd.DataFrame(data, columns = ['month', 'days_in_month'])
summer_months = ['Dec', 'Jan', 'Feb']
contain_values = df[df['month'].str.contains(summer_months)]
print (df)
This would fail because of contain_values = df[df['month'].str.contains(summer_months)]
TypeError: unhashable type: 'list'
I know that contain_values = df[df['month'].str.contains('Dec')]
works but I would like to return the new dataframe
with the summer months in it. Or even all the none summer months using the ~
function.
Thanks
CodePudding user response:
>>> contain_values = df[df['month'].str.contains('|'.join(summer_months))]
>>> contain_values
month days_in_month
0 January 31
1 February 28
11 December 31
CodePudding user response:
You can as well using what .str
offers you:
df[df["month"].str[:3].isin(summer_months)]
OUTPUT
month days_in_month
0 January 31
1 February 28
11 December 31
You can make it more robust using something like this (in case names in the dataframe are not properly capitalized):
df[df["month"].str.capitalize().str[:3]]