Home > Software design >  Fill NaNs in rows with duplicated field based on another field value
Fill NaNs in rows with duplicated field based on another field value

Time:09-23

I have the following table:\

Name  | ID  |  Azimut\
foo   | 1   |  180\
foo   | 6   |  NaN\
bar   | 1   |  NaN\
bar   | 6   |  200 

I want to search for duplicates that have IDs with the value of 1 or 6 (the table can have other numbers) and if the "Azimut" value is NaN, complete it with the value of its corresponding duplicate, returning the following:

Name  | ID  |  Azimut\
foo   | 1   |  180\
foo   | 6   |  180\
bar   | 1   |  200\
bar   | 6   |  200 

What I've tried so far:

nan_rows = df.loc[(df['ID'] == 6) & (df['Azimut'].isna())]
values = df.loc[(df['Name'].isin(nan_rows['Name'])) & (df['ID'] == 1), 'Azimut']
df.loc[(df['ID'] == 6) & (df['Azimut'].isna(), 'Azimut'] = values.values

I would have to do similar code for the reverse case (ID = 1 is NaN and complete with ID = 6), and it wouldn't work if there is no pair of IDs 1 and 6 (It doesn't verify the duplicates). I hope the question is well redacted and I'm listening for any way I could improve it.

CodePudding user response:

IIUC you want to fill null values per name with other non-null value in that group. Then try:

df[df.ID.isin([1, 6])].groupby('Name').Azimut.\
                       transform(lambda x: x.fillna(x.mode()[0]))

I specifically used mode since a duplicate can have various Azimut values per name.

CodePudding user response:

You can use the interpolate method to fill missing values using the values in other rows or columns. You can also use it to fill missing values inside a group:

With this data

import pandas as pd

data={'Name'  :['foo','foo','bar','bar'],
      'ID'    :[1,6,1,6],
      'Azimut':[180,None,None,200]
}
df=pd.DataFrame(data)
df
>>>
   Name  ID  Azimut
0  foo   1   180.0
1  foo   6     NaN
2  bar   1     NaN
3  bar   6   200.0

This call

df.groupby('Name').apply(lambda group: group.interpolate(limit_direction='both'))

Produces :

  Name  ID  Azimut
0  foo   1   180.0
1  foo   6   180.0
2  bar   1   200.0
3  bar   6   200.0

CodePudding user response:

This sorts by Name and then Azimut value. It then groupsby Name. It then copies the first Azimut value to the other rows in the groupby.

df['Azimut'] = df.sort_values(['Name','Azimut']).groupby('Name')['Azimut'].transform('first').sort_index(ascending=True)
  • Related