I have data with group level as ['color', 'fruit', 'date', 'value'].
data = pd.DataFrame({'color': ['Green','Green', 'Green', 'Green', 'Red', 'Red'],
'fruit' : ['Banana', 'Banana', 'Apple', 'Apple', 'Banana', 'Apple'],
'date': ['2011-01-01', '2011-01-02', '2011-01-01', '2011-01-02', '2011-02-01', '2011-02-01'],
'value': [ 1, np.nan, np.nan, 2, 3 , np.nan]})
Output:
Class fruit date value
0 Green Banana 2011-01-01 1.0
1 Green Banana 2011-01-02 NaN
2 Green Apple 2011-01-01 NaN
3 Green Apple 2011-01-02 2.0
4 Yellow Banana 2011-02-01 3.0
5 Yellow Apple 2011-02-01 NaN
I need to fill down for 'value' where for a date we have no data. So this fill down would only be limited to ['color', 'fruit'] level.
I am trying to fill down with
df = df.groupby(['color', 'fruit', 'date'])['value'].mean().replace(to_replace=0, method='ffill')
but this spills the data over to next associated group of [color, fruit]
Expected Output:
Class fruit date value
0 Green Banana 2011-01-01 1.0
1 Green Banana 2011-01-02 1.0
2 Green Apple 2011-01-01 NaN
3 Green Apple 2011-01-02 2.0
4 Yellow Banana 2011-02-01 3.0
5 Yellow Apple 2011-02-01 NaN
CodePudding user response:
You can use GroupBy.cumcount
with pandas.Series.ffill
:
m = data.groupby(["color", "fruit"]).cumcount().astype(bool)
data["value"] = data["value"].ffill().where(m, data["value"])
Or as mentionned by @Mustafa Aydin, simply use GroupBy.ffill
:
data["value"] = data.groupby(["color", "fruit"])["value"].ffill()
Output :
print(data)
color fruit date value
0 Green Banana 2011-01-01 1.0
1 Green Banana 2011-01-02 1.0
2 Green Apple 2011-01-01 NaN
3 Green Apple 2011-01-02 2.0
4 Red Banana 2011-02-01 3.0
5 Red Apple 2011-02-01 NaN