I am trying to create a new column in a data frame based on dates and date ranges. I am a beginner, and have tried several of the answers here but I cannot get them to work. Here is my original code:
df_accident["bank_holidays_2010"] = df_accident["date"].map(
{'Easter': [d.strftime('%d/%m/%Y') for d in pd.date_range('02/04/2010', periods=4)],
'Mayday': [d.strftime('%d/%m/%Y') for d in pd.date_range('03/05/2010', periods=1)],
'Spring Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range('31/05/2010', periods=1)],
'Summer Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range('30/08/2010', periods=1)],
'Christmas and New Year': [d.strftime('%d/%m/%Y') for d in pd.date_range('25/12/2010', periods=9)]
}
)
and repeated for other years. I get an error:
None of [Index(['Easter', 'Mayday', 'Spring Bank Holiday', 'Summer Bank Holiday',\n 'Christmas and New Year'],\n dtype='object')] are in the [columns]"
I have tried:
hols = {'Easter': [d.strftime('%d/%m/%Y') for d in pd.date_range('02/04/2010', periods=4)],
'Mayday': [d.strftime('%d/%m/%Y') for d in pd.date_range('03/05/2010', periods=1)],
'Spring Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range('31/05/2010', periods=1)],
'Summer Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range('30/08/2010', periods=1)],
'Christmas and New Year': [d.strftime('%d/%m/%Y') for d in pd.date_range('25/12/2010', periods=9)]
}
and:
bank_holidays_2010 = {'Easter': ('02/04/2010', '03/04/2010', '05/04/2010', '06/04/2010'),
'Mayday': ('03/05/2010'),
'Spring Bank Holiday': ('31/05/2010'),
'Summer Bank Holiday': ('30/08/2010'),
'Christmas and New Year': ('25/12/2010', '26/12/2010', '27/12/2010', '28/12/2010', '29/12/2010', '30/12/2010', '31/12/2010', '01/01/2011', '02/01/2011')
}
Returns same error.
What I would like to achieve is values for the dates and date ranges:
df_accident['bank_holidays_2010'].value_counts()
Easter 466921
Mayday 301039
Spring Bank Holiday 132195
Christmas and New Year 92931
CodePudding user response:
You need convert value of your d
to key to be mapped to date value of date
column
d = {
'Easter': [d.strftime('%d/%m/%Y') for d in pd.date_range(pd.to_datetime('02/04/2010', dayfirst=True), periods=4)],
'Mayday': [d.strftime('%d/%m/%Y') for d in pd.date_range(pd.to_datetime('03/05/2010', dayfirst=True), periods=1)],
'Spring Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range(pd.to_datetime('31/05/2010', dayfirst=True), periods=1)],
'Summer Bank Holiday': [d.strftime('%d/%m/%Y') for d in pd.date_range(pd.to_datetime('30/08/2010', dayfirst=True), periods=1)],
'Christmas and New Year': [d.strftime('%d/%m/%Y') for d in pd.date_range(pd.to_datetime('25/12/2010', dayfirst=True), periods=9)]
}
d = {v:k for k, lst in d.items() for v in lst}
print(d)
{'02/04/2010': 'Easter', '03/04/2010': 'Easter', '04/04/2010': 'Easter', '05/04/2010': 'Easter', '03/05/2010': 'Mayday', '31/05/2010': 'Spring Bank Holiday', '30/08/2010': 'Summer Bank Holiday', '25/12/2010': 'Christmas and New Year', '26/12/2010': 'Christmas and New Year', '27/12/2010': 'Christmas and New Year', '28/12/2010': 'Christmas and New Year', '29/12/2010': 'Christmas and New Year', '30/12/2010': 'Christmas and New Year', '31/12/2010': 'Christmas and New Year', '01/01/2011': 'Christmas and New Year', '02/01/2011': 'Christmas and New Year'}
CodePudding user response:
Your first problem is that you're using dates in dd/mm/yyyy
format, which will not necessarily give you the correct results from pd.date_range
- for example
pd.date_range('02/04/2010', periods=4)
gives
['2010-02-04', '2010-02-05', '2010-02-06', '2010-02-07']
which is not what you want...
You should always use dates in ISO format, you can always format them on output if required.
Your second issue is that the map is the wrong way around, it is supposed to be of the form { from : to }
. You can generate a new dict which meets that requirement like this:
date_hols = { date : holiday for holiday, dates in hols.items() for date in dates }
This gives:
{'02/04/2010': 'Easter', '03/04/2010': 'Easter', '04/04/2010': 'Easter', '05/04/2010': 'Easter', '03/05/2010': 'Mayday', '31/05/2010': 'Spring Bank Holiday', '31/08/2010': 'Summer Bank Holiday', '25/12/2010': 'Christmas and New Year', '26/12/2010': 'Christmas and New Year', '27/12/2010': 'Christmas and New Year', '28/12/2010': 'Christmas and New Year', '29/12/2010': 'Christmas and New Year', '30/12/2010': 'Christmas and New Year', '31/12/2010': 'Christmas and New Year', '01/01/2011': 'Christmas and New Year', '02/01/2011': 'Christmas and New Year'}
You can then apply this map, filling entries which don't map with an empty string:
df_accident['bank_holidays_2010'] = df_accident['date'].map(date_hols).fillna('')
Now you can groupby
to get counts (I've assumed we're summing a column called num
):
df_accident.groupby('bank_holidays_2010')['num'].sum().reset_index()
Output for random num
data:
bank_holidays_2010 num
0 3489
1 Christmas and New Year 51
2 Easter 42
3 Mayday 12
4 Spring Bank Holiday 0
5 Summer Bank Holiday 10