Looking for an elegant solution where I can expand the dates (past dates and future dates) from my df and maintaining some values.
Having a df like this, note that dates can be duplicated.
import pandas as pd
data = pd.DataFrame({'player_id':[1,1,1,1,1],'field_id':[2,2,2,2],'date':['01-01-2021','05-01-2021','09-01-2021','12-12-2021','01-01-2021'],'score':[1,0,2,4,3]})
'| | player_id | field_id | date | score |
|---:|------------:|-----------:|:-----------|--------:|
| 0 | 1 | 2 | 01-01-2021 | 1 |
| 1 | 1 | 2 | 05-01-2021 | 0 |
| 2 | 1 | 2 | 09-01-2021 | 2 |
| 3 | 1 | 2 | 12-12-2021 | 4 |'
I can get almost what I want with
data.set_index(data.date).sort_index().asfreq('D')
But I have the following problems
- All columns in the df go na (sum['score']) returns 0 when should be 7).
- I would like to add past and future dates out of the scope of the dataframe (from 2015 til 2025 for example).
- All columns values but score and date should be copied when expanding.
The expected output would be something like
'| | player_id | field_id | date | score |
|---:|------------:|-----------:|:-----------|--------:|
| 0 | 1 | 2 | 01-01-2015 | 0 | #first date range I pick
....
| 1 | 1 | 2 | 01-01-2021 | 1 |
| 2 | 1 | 2 | 05-01-2021 | 0 |
| 3 | 1 | 2 | 09-01-2021 | 2 |
| 4 | 1 | 2 | 12-12-2021 | 4 |
....
| 0 | 1 | 2 | 12-31-2025 | 0 |' #last date range I pick
CodePudding user response:
Assuming you have data for the other columns. To get daterange you can use this:
pd.date_range(start='01-01-2015', end='12-31-2025')
https://pandas.pydata.org/docs/reference/api/pandas.date_range.html