Suppose I have the following pandas dataframe:
df = pd.DataFrame({'Date':['2015-01-31','2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30', '2015-04-30'], 'ID':[1,2,2,2,1,2], 'value':[1,2,3,4,5,6]})
print(df)
Date ID value
2015-01-31 1 1
2015-01-31 2 2
2015-02-28 2 3
2015-03-31 2 4
2015-04-30 1 5
2015-04-30 2 6
I want to forward fill the data such that I have the values for each end of month till 2015-05-31 (i.e. for each date - ID combination). That is, I would like the dataframe to look as follows:
Date ID value
2015-01-31 1 1
2015-01-31 2 2
2015-02-28 2 3
2015-02-28 1 1
2015-03-31 2 4
2015-03-31 1 1
2015-04-30 1 5
2015-04-30 2 6
2015-05-31 1 5
2015-05-31 2 6
Is something like this possible? I saw several similar questions on Stackoverflow on forward filling dates, however this was without an index column (where the same date can occur many times).
CodePudding user response:
You can pivot
then fill value with reindex
ffill
out = df.pivot(*df.columns).reindex(pd.date_range('2015-01-31',periods = 5,freq='M')).ffill().stack().reset_index()
out.columns = df.columns
out
Out[1077]:
Date ID value
0 2015-01-31 1 1.0
1 2015-01-31 2 2.0
2 2015-02-28 1 1.0
3 2015-02-28 2 3.0
4 2015-03-31 1 1.0
5 2015-03-31 2 4.0
6 2015-04-30 1 5.0
7 2015-04-30 2 6.0
8 2015-05-31 1 5.0
9 2015-05-31 2 6.0
CodePudding user response:
Another solution:
idx = pd.MultiIndex.from_product(
[
pd.date_range(df["Date"].min(), "2015-05-31", freq="M"),
df["ID"].unique(),
],
names=["Date", "ID"],
)
df = df.set_index(["Date", "ID"]).reindex(idx).groupby(level=1).ffill()
print(df.reset_index())
Prints:
Date ID value
0 2015-01-31 1 1.0
1 2015-01-31 2 2.0
2 2015-02-28 1 1.0
3 2015-02-28 2 3.0
4 2015-03-31 1 1.0
5 2015-03-31 2 4.0
6 2015-04-30 1 5.0
7 2015-04-30 2 6.0
8 2015-05-31 1 5.0
9 2015-05-31 2 6.0