I have this dataframe:
import pandas as pd
data = [{'a':2,'b': 2, 'c':3},{'b': 2, 'c':np.nan}, {'a': 10, 'b': 20, 'c': 30}, {'a': 10, 'b': np.nan, 'c': np.nan}]
df = pd.DataFrame(data, index =['John', 'John', 'Mike' ,'Mike'])
What I am trying to do is to fill the missing data of every user.
My goal dataframe would be:
data = [{'a':2,'b': 2, 'c':3},{'a':2, 'b': 2, 'c':3}, {'a': 10, 'b': 20, 'c': 30}, {'a': 10, 'b': 20, 'c': 30}]
df = pd.DataFrame(data, index =['John', 'John', 'Mike' ,'Mike'])
Now this should be applied for thousands of rows, but I believe this minimalistic example should be fine to accomplish that in a big dataframe.
I do not want to use pd.merge since this would add thousands of columns to my dataframe since my original dataframes have that amount of columns
CodePudding user response:
You can use groupby().transform('first')
to extract the first valid values for each user, then fillna
:
df = df.fillna(df.groupby(level=0).transform('first'))
Note: You can
- replace
'first'
with other functions, e.g.'mean'
if you like. - apply the function directly instead of
transform
:groupby().first()
, since you are grouping based on index.