Home > OS >  pandas create new dataframe from range of two column values
pandas create new dataframe from range of two column values

Time:09-22

I have dataframe like this where index contains the start_period/end_period.

enter image description here

And i want to create range from dataframe index and output like this:

|    range_date         |       due_date      |
| ------------------    | --------------------|
| 2021-07-30 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-01 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-02 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-29 00:00:00   | 2021-08-30 00:00:00 |
| 2021-08-30 00:00:00   | 2021-08-30 00:00:00 |
| 2021-09-01 00:00:00   | 2021-08-30 00:00:00 |
| 2021-09-28 00:00:00   | 2021-09-30 00:00:00 |
| 2021-09-29 00:00:00   | 2021-09-30 00:00:00 |
| 2021-09-30 00:00:00   | 2021-09-30 00:00:00 |

Clarification: If you check the first row where the index contains the range start_period/ end_period, so i want to convert that range into per day row and its value would be the column value:

range_date      due_date
2021-07-30      2021-07-30
2021-08-01      2021-07-30
2021-08-02      2021-07-30

same goes for other rows

CodePudding user response:

df = pd.DataFrame(index=['2021-07-30 00:00:00/2021-08-02 00:00:00',
                                  '2021-08-29 00:00:00/2021-09-01 00:00:00',
                                  '2021-09-28 00:00:00/2021-10-01 00:00:00'],
                  data={'due_date/modified_due_date':['2021-07-30 00:00:00', '2021-08-30 00:00:00', '2021-09-30 00:00:00']})

df['index'] = df.index
df[['start', 'end']] = df['index'].str.split('/', expand=True)
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

dates = []
for index, row in df.iterrows():
    dates.extend(pd.date_range(row['start'], row['end']))

df_dates = pd.DataFrame(data={'range_date': dates})
df_dates = df_dates.merge(df[['start', 'due_date/modified_due_date']], how='left', left_on=['range_date'], right_on=['start'])[['range_date', 'due_date/modified_due_date']]
df_dates['due_date/modified_due_date'] = df_dates['due_date/modified_due_date'].ffill()
df_dates.rename(columns={'due_date/modified_due_date': 'due_date'}, inplace=True)

OUTPUT:

range_date             due_date
0  2021-07-30  2021-07-30 00:00:00
1  2021-07-31  2021-07-30 00:00:00
2  2021-08-01  2021-07-30 00:00:00
3  2021-08-02  2021-07-30 00:00:00
4  2021-08-29  2021-08-30 00:00:00
5  2021-08-30  2021-08-30 00:00:00
6  2021-08-31  2021-08-30 00:00:00
7  2021-09-01  2021-08-30 00:00:00
8  2021-09-28  2021-09-30 00:00:00
9  2021-09-29  2021-09-30 00:00:00
10 2021-09-30  2021-09-30 00:00:00
11 2021-10-01  2021-09-30 00:00:00
  • Related