I have a list containing user ids.
userids_with_missingdata = ['1234','1236','1238']
I also have the first dataframe (df1) containing many user_ids
user_id age weight height
1234 20 60kg 170cm
1235 21 70kg 160cm
1236 56 80kg 172cm
1237 48 59kg 174cm
1238 63 100kg 180cm
I also have the second dataframe (df2) containing the same user_ids as the list (and others)
user_id age weight height
1234 NaN 60kg 170cm
5487 21 70kg 160cm
1236 NaN 80kg 172cm
3476 48 59kg 174cm
1238 NaN 100kg 180cm
I would like to create a for loop followed by an if statement to
- Loop over the user_ids in the first data frame and extract the rows that contain user ids in the list
- Check if the values that are missing for the user_ids in the second dataframe can be found in the first dataframe, and if they can, forward fill these values into the first dataframe
The end result would make df2 end up looking like:
user_id age weight height
1234 20 60kg 170cm
5487 21 70kg 160cm
1236 56 80kg 172cm
3476 48 59kg 174cm
1238 63 100kg 180cm
Any help on this would be very appreciated!
CodePudding user response:
You can try pandas.Series.update
df1.set_index('user_id')['age'].update(df2.set_index('user_id')['age'])
Notice update
is inplace by default.
print(df1)
user_id age weight height
0 1234 20 60kg 170cm
1 1235 21 70kg 160cm
2 1236 56 80kg 172cm
3 1237 48 59kg 174cm
4 1238 63 100kg 180cm
CodePudding user response:
df2.update(df1)
df2.age = df2.age.astype(int)
df2
user_id age weight height
0 1234 20 60kg 170cm
1 1235 21 70kg 160cm
2 1236 56 80kg 172cm
3 1237 48 59kg 174cm
4 1238 63 100kg 180cm
In the case of different indices, Set the index of both dataframes to be the use_id
ie df2.set_index('user_id').update(df1.set_index('user_id'))