Home > database >  How to replace nan / fillna per group in a pandas dataframe?
How to replace nan / fillna per group in a pandas dataframe?

Time:11-09

I have the following data:

               type           group
0           Drought  Climatological
1               nan  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],  
                        'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',  
                        'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})

How can I replace the 'nan' depending on the group?

Below is my current approach:

I want to replace nan strings that match with specific strings from the next row in another column by some alternative string.

Here's a sample of data from my dataset where it seizes to work this was an output from pd.to_dict() I wanted to keep it as it is to replicate my issue.:

for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
    if ty == 'nan' and go == 'Climatological':
        #ty = ['Drought']
        print(ty) #prints nothing as it did not work

CodePudding user response:

Do NOT iterate for this kind of task, this is inefficient!

You can use masks and pandas.where to apply your filter:

data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')

output:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological

much cleaner solution

If your objective is to fillna per group, you could use a dictionary and groupy:

subs = {'Climatological': 'Drought', 'Technological': 'foo'}

(data_pd.replace('nan', pd.NA)
        .groupby('group')
        .apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)

output:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               foo   Technological
9         Explosion   Technological
10              nan  Meteorological

CodePudding user response:

for index, row in data_pd.iterrows():
    if row["type"] == 'nan' and row["group"] == 'Climatological':
        data_pd["type"][index] = "Drought"

I tried to match your code as closely as possible to be more concise and user-friendly.

  • Related