I have panel data. As you can see there are missing dates. I want the date to be continuous and fill the missing value with the previous date's value.
sometimes I only have one observation per token per month. In that case I want all the subsequent days to have the initial value.
token_address date price1 price2
0 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-01 0.000214 0.573024
1 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-02 0.000214 0.558163
2 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-04 0.000166 0.451243
3 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-05 0.000137 0.387212
4 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-06 0.000117 0.337081
5 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-08 0.000103 0.311277
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-10 0.000067 0.245770
7 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-15 0.000085 0.281036
8 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-17 0.000089 0.267481
9 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-19 0.000089 0.281457
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-22 0.000094 0.305423
11 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-01 0.000374 0.991719
12 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-02 0.000382 1.011735
14 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-04 0.000398 1.073131
15 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-05 0.000369 1.044095
so desired df (pls ignore the index as I am manually manipulating this)
token_address date price1 price2
0 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-01 0.000214 0.573024
1 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-02 0.000214 0.558163
2 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-03 0.000214 0.558163
2 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-04 0.000166 0.451243
3 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-05 0.000137 0.387212
4 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-06 0.000117 0.337081
4 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-07 0.000117 0.337081
5 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-08 0.000103 0.311277
5 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-09 0.000103 0.311277
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-10 0.000067 0.245770
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-11 0.000067 0.245770
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-12 0.000067 0.245770
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-13 0.000067 0.245770
6 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-14 0.000067 0.245770
7 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-15 0.000085 0.281036
7 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-16 0.000085 0.281036
8 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-17 0.000089 0.267481
8 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-18 0.000089 0.267481
9 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-19 0.000089 0.281457
9 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-20 0.000089 0.281457
9 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-21 0.000089 0.281457
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-22 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-23 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-24 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-25 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-26 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-27 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-28 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-29 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-30 0.000094 0.305423
10 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-31 0.000094 0.305423
11 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-01 0.000374 0.991719
12 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-02 0.000382 1.011735
13 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-03 0.000382 1.000586
14 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-04 0.000398 1.073131
15 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-05 0.000369 1.044095
......
15 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-29 0.000369 1.044095
15 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-30 0.000369 1.044095
15 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-31 0.000369 1.044095
CodePudding user response:
One option using Pandas only is to build a DataFrame containing all of token_address
and dates from 2018-01-01
to 2018-01-31
, and merge to the original dataframe:
tokens = df.token_address.unique()
dates = pd.date_range('2021-08-01', '2021-08-31')
names = ['token_address', 'date']
arr = pd.MultiIndex.from_product([tokens, dates], names=names)
arr = arr.to_frame(index = None)
(pd
.merge_ordered(
df,
arr,
how = 'outer',
on = names,
fill_method = 'ffill')
)
token_address date price1 price2
0 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-01 0.000214 0.573024
1 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-02 0.000214 0.558163
2 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-03 0.000214 0.558163
3 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-04 0.000166 0.451243
4 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-05 0.000137 0.387212
.. ... ... ... ...
57 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-27 0.000369 1.044095
58 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-28 0.000369 1.044095
59 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-29 0.000369 1.044095
60 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-30 0.000369 1.044095
61 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-31 0.000369 1.044095
[62 rows x 4 columns]
Kindly verify that the forward fill works properly. if not, you can use a regular merge and ffill
after.
Another option is with complete from pyjanitor - in this case, you pass a dictionary of the new dates to the function, to be called on every group in token_address
- the key of the dictionary must exist in your dataframe.
# pip install pyjanitor
import janitor
new_dates = {'date' : dates}
df.complete(new_dates, by = 'token_address').ffill()
token_address date price1 price2
0 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-01 0.000214 0.573024
1 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-02 0.000214 0.558163
2 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-03 0.000214 0.558163
3 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-04 0.000166 0.451243
4 0x0000000000004946c0e9f43f4dee607b0ef1fa1c 2021-08-05 0.000137 0.387212
.. ... ... ... ...
57 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-27 0.000369 1.044095
58 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-28 0.000369 1.044095
59 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-29 0.000369 1.044095
60 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-30 0.000369 1.044095
61 0x0000000000085d4780b73119b644ae5ecd22b376 2021-08-31 0.000369 1.044095
[62 rows x 4 columns]
Again verify the dates are in the proper order; if not you can sort in the preferred order, before forward filling on price1 and price2.
CodePudding user response:
here is one way to do it
# set the index to the date column
# groupby token and resample the index and ffill
df['date'] = pd.to_datetime(df['date'])
(df.set_index('date')
.groupby(['token_address'])
.apply(lambda x: x.reindex(pd.date_range(min(df.date),
max(df['date']) pd.offsets.MonthEnd(n=0),
freq ='D')) )).ffill()