Home > Software engineering >  Convert a DataFrame with Periods ("from" and "to" date columns) to a Series
Convert a DataFrame with Periods ("from" and "to" date columns) to a Series

Time:10-14

I have a DataFrame with school holidays. They have a "from" and "to" date column. Can you provide me with a neat and short way to convert it to a "is_holiday" Series for every day?

I have:

| idx | From | To | Name |
| :-- | :--------- | :--------- | :------------- |
| 0 | 2017-12-25 | 2018-01-05 | Xmas holiday |
| 1 | 2018-02-12 | 2018-02-23 | Sport holidy |
| 2 | 2018-03-29 | 2018-04-02 | Easter holiday |
...

I want:

| Date | is_holiday |
| :--------- | ---------- |
| .. | |
| 2017-12-24 | False |
| 2017-12-25 | True |
| 2017-12-26 | True |
| .. | |
| 2018-01-04 | False |
| 2018-01-05 | True |
| .. | |
and so on..
...

Example DataFrame for your convenience:

import pandas as pd
df = pd.DataFrame({
    "From": ["2017-12-25", "2018-02-12", "2018-03-29"],
    "To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)

CodePudding user response:

This range all dates from the lowest From to the highest To, but you can tune the interval as you wish:

df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)

holidays = []
for ix,row in df.iterrows():
    holidays  = pd.date_range(row.From,row.To).tolist()

all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True

EDIT, cleaner code:

holidays = []

def holidays(x):
    return pd.date_range(x.From,x.To).tolist()

holidays = df.apply(lambda x:holidays(x), axis=1).sum()
all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True

CodePudding user response:

This is the smallest solution i came up with in the end. It is based on @imburningbabe first solution. Many thanks for the inspiration! I wouldn't have been able to do it without your answer

df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From); df.To = pd.to_datetime(df.To)


all_dates = pd.DataFrame(index=pd.date_range(df.From.min(),df.To.max()))
all_dates['is_holiday'] = False

for (from_, to) in df.itertuples(index=False):
    all_dates.loc[from_:to, 'is_holiday'] = True
  • Related