For loop to extract rows and ffill into another dataframe-CodePudding

I have a list containing user ids.

userids_with_missingdata = ['1234','1236','1238']

I also have the first dataframe (df1) containing many user_ids

user_id   age   weight  height
1234       20     60kg  170cm
1235       21     70kg  160cm
1236       56     80kg  172cm
1237       48     59kg  174cm
1238       63     100kg 180cm

I also have the second dataframe (df2) containing the same user_ids as the list (and others)

user_id   age   weight  height
  1234     NaN     60kg  170cm
  5487     21      70kg  160cm
  1236     NaN     80kg  172cm
  3476     48      59kg  174cm
  1238     NaN     100kg 180cm

I would like to create a for loop followed by an if statement to

Loop over the user_ids in the first data frame and extract the rows that contain user ids in the list
Check if the values that are missing for the user_ids in the second dataframe can be found in the first dataframe, and if they can, forward fill these values into the first dataframe

The end result would make df2 end up looking like:

user_id   age   weight  height
  1234     20     60kg  170cm
  5487     21     70kg  160cm
  1236     56     80kg  172cm
  3476     48     59kg  174cm
  1238     63     100kg 180cm

Any help on this would be very appreciated!

CodePudding user response：

You can try pandas.Series.update

df1.set_index('user_id')['age'].update(df2.set_index('user_id')['age'])

Notice update is inplace by default.

print(df1)

   user_id  age weight height
0     1234   20   60kg  170cm
1     1235   21   70kg  160cm
2     1236   56   80kg  172cm
3     1237   48   59kg  174cm
4     1238   63  100kg  180cm

CodePudding user response：

df2.update(df1)

df2.age = df2.age.astype(int)
df2

   user_id  age weight height
0     1234   20   60kg  170cm
1     1235   21   70kg  160cm
2     1236   56   80kg  172cm
3     1237   48   59kg  174cm
4     1238   63  100kg  180cm

In the case of different indices, Set the index of both dataframes to be the use_id

ie df2.set_index('user_id').update(df1.set_index('user_id'))