Adding holiday eves on a dateframe in python-CodePudding

I have the following dataframe in Python: The date column is in TimeStamp format.

date	holiday_type	name	other
2022-01-01 00:00:00	Holiday	Holiday 1	UK
2022-01-02 00:00:00	Holiday	Holiday 2	UK
2022-03-08 00:00:00	Holiday	Holiday 3	UK
2022-04-12 00:00:00	Holiday	Holiday 4	UK

I want to add new rows for records from the day before those specified dates. The resulting dataframe will look like this:

date	holiday_type	name	other
2021-12-31 00:00:00	Pre Holiday	(Pre) Holiday 1	UK
2022-01-01 00:00:00	Holiday	Holiday 1	UK
2022-01-02 00:00:00	Holiday	Holiday 2	UK
2022-03-07 00:00:00	Pre Holiday	(Pre) Holiday 3	UK
2022-03-08 00:00:00	Holiday	Holiday 3	UK
2022-04-11 00:00:00	Pre Holiday	(Pre) Holiday 4	UK
2022-04-12 00:00:00	Holiday	Holiday 4	UK

Exceptions are that if the previous day is already a holiday, the pre-holiday is not added.

I hope you can help me, thank you.

CodePudding user response：

Might be a more efficient way to do this, but here's how I did it.

I created a dataframe that offset your dates by a day. Then added the suffix '(Pre) ' and changed the holiday_type to 'Pre Holiday'. I then appended it to the original dataframe, sorted, and dropped duplicate dates, keeping the last entry.

import pandas as pd

cols = ['date','holiday_type','name','other']
data = [['2022-01-01 00:00:00', 'Holiday',  'Holiday 1',    'UK'],
['2022-01-02 00:00:00', 'Holiday',  'Holiday 2',    'UK'],
['2022-03-08 00:00:00', 'Holiday',  'Holiday 3',    'UK'],
['2022-04-12 00:00:00', 'Holiday',  'Holiday 4',    'UK']]



df = pd.DataFrame(data, columns=cols)
df['date'] = pd.to_datetime(df['date'])

df_yesterday = df[df['holiday_type'] == 'Holiday']
df_yesterday['date']  = df_yesterday['date']   pd.offsets.Day(-1)
df_yesterday['holiday_type'] = 'Pre Holiday'
df_yesterday['name'] = '(Pre) '   df_yesterday['name']

df = pd.concat([df, df_yesterday]).sort_values(['date', 'holiday_type'], ascending=[True, False]).reset_index(drop=True)
df = df.drop_duplicates(['date'], keep='last').reset_index(drop=True)

Output:

print(df)
        date holiday_type             name other
0 2021-12-31  Pre Holiday  (Pre) Holiday 1    UK
1 2022-01-01      Holiday        Holiday 1    UK
2 2022-01-02      Holiday        Holiday 2    UK
3 2022-03-07  Pre Holiday  (Pre) Holiday 3    UK
4 2022-03-08      Holiday        Holiday 3    UK
5 2022-04-11  Pre Holiday  (Pre) Holiday 4    UK
6 2022-04-12      Holiday        Holiday 4    UK