Home > Mobile >  Fill NAs : Min of values in group
Fill NAs : Min of values in group

Time:01-19

Here is my DataFrame.

df = pd.DataFrame ( {'CNN': ['iphone 11 63 GB TMO','iphone 11 128 GB ATT','iphone 11 other carrier','iphone 12 256 GB TMO','iphone 12 64 GB TMO','iphone 12 other carrier'], 
                            'Family Name':['iphone 11', 'iphone 11', 'iphone 11', 'iphone 12', 'iphone 12', 'iphone 12'],
                            'Storage': [63, 128,np.nan, 256,64, np.nan]})
Output:

                       CNN Family Name  Storage
0      iphone 11 63 GB TMO   iphone 11     63.0
1     iphone 11 128 GB ATT   iphone 11    128.0
2  iphone 11 other carrier   iphone 11      NaN
3     iphone 12 256 GB TMO   iphone 12    256.0
4      iphone 12 64 GB TMO   iphone 12     64.0
5  iphone 12 other carrier   iphone 12      NaN

What I am trying to achieve is find NAs. Criteria is minimum of storage from group(Family Name). I have tried to group by and fillna(min()) but it doesnt seems to be working.

#Tried
df["Storage"] = df.groupby("Family Name").apply(lambda x: x.fillna(x.min()))

Here is final output expected.

Expected Output:

                       CNN Family Name  Storage
0      iphone 11 63 GB TMO   iphone 11     63.0
1     iphone 11 128 GB ATT   iphone 11    128.0
2  iphone 11 other carrier   iphone 11     63.0
3     iphone 12 256 GB TMO   iphone 12    256.0
4      iphone 12 64 GB TMO   iphone 12     64.0
5  iphone 12 other carrier   iphone 12     64.0

CodePudding user response:

Use groupby.transform with fillna:

df['Storage'] = df['Storage'].fillna(df.groupby('Family Name')['Storage'].transform('min'))

Or, potentially more efficient, groupby.min and map with boolean indexing depending on the size of the DataFrame, number of NaNs (few of them), and number of groups:

s = df.groupby('Family Name')['Storage'].min()
m = df['Storage'].isna()
df.loc[m, 'Storage'] = df.loc[m, 'Family Name'].map(s)

Output:

                       CNN Family Name  Storage
0      iphone 11 63 GB TMO   iphone 11     63.0
1     iphone 11 128 GB ATT   iphone 11    128.0
2  iphone 11 other carrier   iphone 11     63.0
3     iphone 12 256 GB TMO   iphone 12    256.0
4      iphone 12 64 GB TMO   iphone 12     64.0
5  iphone 12 other carrier   iphone 12     64.0

CodePudding user response:

Use GroupBy.transform for Series with same size like original column with Series.fillna:

df["Storage"] = df["Storage"].fillna(df.groupby("Family Name")["Storage"].transform('min'))

Your solution should be changed:

df["Storage"] = df.groupby("Family Name")["Storage"].transform(lambda x: x.fillna(x.min()))

CodePudding user response:

Another possible solution:

(df.groupby('Family Name', group_keys=True, as_index=False)
 .apply(lambda g: g.sort_values('Storage', ascending=False).ffill().sort_index())
 .reset_index(drop=True))

Output:

                       CNN Family Name  Storage
0      iphone 11 63 GB TMO   iphone 11     63.0
1     iphone 11 128 GB ATT   iphone 11    128.0
2  iphone 11 other carrier   iphone 11     63.0
3     iphone 12 256 GB TMO   iphone 12    256.0
4      iphone 12 64 GB TMO   iphone 12     64.0
5  iphone 12 other carrier   iphone 12     64.0
  • Related