I have the following dataframe in Python: The date column is in TimeStamp format.
date | holiday_type | name | other |
---|---|---|---|
2022-01-01 00:00:00 | Holiday | Holiday 1 | UK |
2022-01-02 00:00:00 | Holiday | Holiday 2 | UK |
2022-03-08 00:00:00 | Holiday | Holiday 3 | UK |
2022-04-12 00:00:00 | Holiday | Holiday 4 | UK |
I want to add new rows for records from the day before those specified dates. The resulting dataframe will look like this:
date | holiday_type | name | other |
---|---|---|---|
2021-12-31 00:00:00 | Pre Holiday | (Pre) Holiday 1 | UK |
2022-01-01 00:00:00 | Holiday | Holiday 1 | UK |
2022-01-02 00:00:00 | Holiday | Holiday 2 | UK |
2022-03-07 00:00:00 | Pre Holiday | (Pre) Holiday 3 | UK |
2022-03-08 00:00:00 | Holiday | Holiday 3 | UK |
2022-04-11 00:00:00 | Pre Holiday | (Pre) Holiday 4 | UK |
2022-04-12 00:00:00 | Holiday | Holiday 4 | UK |
Exceptions are that if the previous day is already a holiday, the pre-holiday is not added.
I hope you can help me, thank you.
CodePudding user response:
Might be a more efficient way to do this, but here's how I did it.
I created a dataframe that offset your dates by a day. Then added the suffix '(Pre) '
and changed the holiday_type
to 'Pre Holiday'
. I then appended it to the original dataframe, sorted, and dropped duplicate dates, keeping the last entry.
import pandas as pd
cols = ['date','holiday_type','name','other']
data = [['2022-01-01 00:00:00', 'Holiday', 'Holiday 1', 'UK'],
['2022-01-02 00:00:00', 'Holiday', 'Holiday 2', 'UK'],
['2022-03-08 00:00:00', 'Holiday', 'Holiday 3', 'UK'],
['2022-04-12 00:00:00', 'Holiday', 'Holiday 4', 'UK']]
df = pd.DataFrame(data, columns=cols)
df['date'] = pd.to_datetime(df['date'])
df_yesterday = df[df['holiday_type'] == 'Holiday']
df_yesterday['date'] = df_yesterday['date'] pd.offsets.Day(-1)
df_yesterday['holiday_type'] = 'Pre Holiday'
df_yesterday['name'] = '(Pre) ' df_yesterday['name']
df = pd.concat([df, df_yesterday]).sort_values(['date', 'holiday_type'], ascending=[True, False]).reset_index(drop=True)
df = df.drop_duplicates(['date'], keep='last').reset_index(drop=True)
Output:
print(df)
date holiday_type name other
0 2021-12-31 Pre Holiday (Pre) Holiday 1 UK
1 2022-01-01 Holiday Holiday 1 UK
2 2022-01-02 Holiday Holiday 2 UK
3 2022-03-07 Pre Holiday (Pre) Holiday 3 UK
4 2022-03-08 Holiday Holiday 3 UK
5 2022-04-11 Pre Holiday (Pre) Holiday 4 UK
6 2022-04-12 Holiday Holiday 4 UK