I have a dataframe that has values for names based on multiple dates. I have created a series of null values for each name on a new date, and want to fill null values of one column based on the last non-null value for that name.
data = {'name': ['Alex', 'Ben', 'Marry','Alex', 'Ben', 'Marry'],
'job': ['teacher', 'doctor', 'engineer','teacher', 'doctor', 'engineer'],
'age': ['27', '32', '78','27', '32', '78'],
'weight': ['160', '209', '130','164', '206', '132'],
'date': ['6-12-2022', '6-12-2022', '6-12-2022','6-13-2022', '6-13-2022', '6-13-2022']
}
df = pd.DataFrame(data)
df
After adding null values:
|name |job |age|weight |date
|---|-------|-----------|---|-------|--------
|0 |Alex |teacher |27 |160 |6-12-2022
|1 |Ben |doctor |32 |209 |6-12-2022
|2 |Marry |engineer |78 |130 |6-12-2022
|3 |Alex |teacher |27 |164 |6-13-2022
|4 |Ben |doctor |32 |206 |6-13-2022
|5 |Marry |engineer |78 |132 |6-13-2022
|6 |Alex |NaN |NaN|NaN |6-14-2022
|7 |Ben |NaN |NaN|NaN |6-14-2022
|8 |Marry |NaN |NaN|NaN |6-14-2022
Now I need to fill null values for job, and age based on last entered value for that name.
Appreciate your help
Thanks
CodePudding user response:
If I understand you correctly, you can .groupby()
and then .ffill()
:
df[["job", "age", "weight"]] = df.groupby("name")[["job", "age", "weight"]].ffill()
print(df)
Prints:
name job age weight date
0 Alex teacher 27.0 160.0 6-12-2022
1 Ben doctor 32.0 209.0 6-12-2022
2 Marry engineer 78.0 130.0 6-12-2022
3 Alex teacher 27.0 164.0 6-13-2022
4 Ben doctor 32.0 206.0 6-13-2022
5 Marry engineer 78.0 132.0 6-13-2022
6 Alex teacher 27.0 164.0 6-14-2022
7 Ben doctor 32.0 206.0 6-14-2022
8 Marry engineer 78.0 132.0 6-14-2022