I have the following table:\
Name | ID | Azimut\
foo | 1 | 180\
foo | 6 | NaN\
bar | 1 | NaN\
bar | 6 | 200
I want to search for duplicates that have IDs with the value of 1 or 6 (the table can have other numbers) and if the "Azimut" value is NaN, complete it with the value of its corresponding duplicate, returning the following:
Name | ID | Azimut\
foo | 1 | 180\
foo | 6 | 180\
bar | 1 | 200\
bar | 6 | 200
What I've tried so far:
nan_rows = df.loc[(df['ID'] == 6) & (df['Azimut'].isna())]
values = df.loc[(df['Name'].isin(nan_rows['Name'])) & (df['ID'] == 1), 'Azimut']
df.loc[(df['ID'] == 6) & (df['Azimut'].isna(), 'Azimut'] = values.values
I would have to do similar code for the reverse case (ID = 1 is NaN and complete with ID = 6), and it wouldn't work if there is no pair of IDs 1 and 6 (It doesn't verify the duplicates). I hope the question is well redacted and I'm listening for any way I could improve it.
CodePudding user response:
IIUC you want to fill null values per name
with other non-null value in that group. Then try:
df[df.ID.isin([1, 6])].groupby('Name').Azimut.\
transform(lambda x: x.fillna(x.mode()[0]))
I specifically used mode
since a duplicate can have various Azimut
values per name
.
CodePudding user response:
You can use the interpolate method to fill missing values using the values in other rows or columns. You can also use it to fill missing values inside a group:
With this data
import pandas as pd
data={'Name' :['foo','foo','bar','bar'],
'ID' :[1,6,1,6],
'Azimut':[180,None,None,200]
}
df=pd.DataFrame(data)
df
>>>
Name ID Azimut
0 foo 1 180.0
1 foo 6 NaN
2 bar 1 NaN
3 bar 6 200.0
This call
df.groupby('Name').apply(lambda group: group.interpolate(limit_direction='both'))
Produces :
Name ID Azimut
0 foo 1 180.0
1 foo 6 180.0
2 bar 1 200.0
3 bar 6 200.0
CodePudding user response:
This sorts by Name and then Azimut value. It then groupsby Name. It then copies the first Azimut value to the other rows in the groupby.
df['Azimut'] = df.sort_values(['Name','Azimut']).groupby('Name')['Azimut'].transform('first').sort_index(ascending=True)