Fill NaNs in rows with duplicated field based on another field value-CodePudding

I have the following table:\

Name  | ID  |  Azimut\
foo   | 1   |  180\
foo   | 6   |  NaN\
bar   | 1   |  NaN\
bar   | 6   |  200

I want to search for duplicates that have IDs with the value of 1 or 6 (the table can have other numbers) and if the "Azimut" value is NaN, complete it with the value of its corresponding duplicate, returning the following:

Name  | ID  |  Azimut\
foo   | 1   |  180\
foo   | 6   |  180\
bar   | 1   |  200\
bar   | 6   |  200

What I've tried so far:

nan_rows = df.loc[(df['ID'] == 6) & (df['Azimut'].isna())]
values = df.loc[(df['Name'].isin(nan_rows['Name'])) & (df['ID'] == 1), 'Azimut']
df.loc[(df['ID'] == 6) & (df['Azimut'].isna(), 'Azimut'] = values.values

I would have to do similar code for the reverse case (ID = 1 is NaN and complete with ID = 6), and it wouldn't work if there is no pair of IDs 1 and 6 (It doesn't verify the duplicates). I hope the question is well redacted and I'm listening for any way I could improve it.

CodePudding user response：

IIUC you want to fill null values per name with other non-null value in that group. Then try:

df[df.ID.isin([1, 6])].groupby('Name').Azimut.\
                       transform(lambda x: x.fillna(x.mode()[0]))

I specifically used mode since a duplicate can have various Azimut values per name.

CodePudding user response：

You can use the interpolate method to fill missing values using the values in other rows or columns. You can also use it to fill missing values inside a group:

With this data

import pandas as pd

data={'Name'  :['foo','foo','bar','bar'],
      'ID'    :[1,6,1,6],
      'Azimut':[180,None,None,200]
}
df=pd.DataFrame(data)
df
>>>
   Name  ID  Azimut
0  foo   1   180.0
1  foo   6     NaN
2  bar   1     NaN
3  bar   6   200.0

This call

df.groupby('Name').apply(lambda group: group.interpolate(limit_direction='both'))

Produces :

  Name  ID  Azimut
0  foo   1   180.0
1  foo   6   180.0
2  bar   1   200.0
3  bar   6   200.0

CodePudding user response：

This sorts by Name and then Azimut value. It then groupsby Name. It then copies the first Azimut value to the other rows in the groupby.

df['Azimut'] = df.sort_values(['Name','Azimut']).groupby('Name')['Azimut'].transform('first').sort_index(ascending=True)