Original table (or Dataframe)
ID | Start | End |
---|---|---|
1 | 2020-09-01 | 2021-12-31 |
2 | 2019-07-01 | 2021-07-31 |
... | ... | ... |
1000 | 2015-01-01 | 2016-03-31 |
Desired table
ID | Start | End | Start_y | End_y | List | Day for calculation | Days | Multiple |
---|---|---|---|---|---|---|---|---|
1 | 2020-09-01 | 2021-12-31 | 2020 | 2021 | 2020 | 2020-12-31 | 122 | 122/366 |
1 | 2020-09-01 | 2021-12-31 | 2020 | 2021 | 2021 | 2021-01-01 | 365 | 122/365 |
2 | 2019-07-01 | 2021-07-31 | 2019 | 2021 | 2019 | 2019-12-31 | 184 | 184/365 |
2 | 2019-07-01 | 2021-07-31 | 2019 | 2021 | 2020 | 2020-12-31 | 365 | 366/366 |
2 | 2019-07-01 | 2021-07-31 | 2019 | 2021 | 2021 | 2021-01-01 | 212 | 212/365 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1000 | 2015-01-01 | 2016-03-31 | 2015 | 2016 | 2015 | 2015-12-31 | 365 | 365/365 |
1000 | 2015-01-01 | 2016-03-31 | 2015 | 2016 | 2016 | 2016-01-01 | 91 | 91/365 |
I tried to write as follows until "List" column. But got stuck in calculating days. Any advice on deriving the "Days" column in the desired table?
df['Start_y'] = df['Start'].dt.year
df['End_y'] = df['End'].dt.year
def getlist(sta,end):
return list(range(sta,end 1))
df['List'] = df.apply(lambda x: getlist(x['Start_y'], x['End_y']), axis = 1)
df = df.explode(['List'])
CodePudding user response:
You can try
start = pd.to_datetime(df['List'], format='%Y')
end = start pd.offsets.YearEnd()
df['Days'] = (pd.concat([end-start, end-df['Start'],
df['End']-start, df['End']-df['Start']],
axis=1).min(axis=1)
pd.Timedelta(days=1))
print(df)
ID Start End Start_y End_y List Days
0 1 2020-09-01 2021-12-31 2020 2021 2020 122 days
1 1 2020-09-01 2021-12-31 2020 2021 2021 365 days
2 2 2019-07-01 2021-07-31 2019 2021 2019 184 days
3 2 2019-07-01 2021-07-31 2019 2021 2020 366 days # 2020 is leap year
4 2 2019-07-01 2021-07-31 2019 2021 2021 212 days
5 1000 2015-01-01 2016-03-31 2015 2016 2015 365 days
6 1000 2015-01-01 2016-03-31 2015 2016 2016 91 days