I have the following data:
type group
0 Drought Climatological
1 nan Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],
'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',
'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})
How can I replace the 'nan'
depending on the group?
Below is my current approach:
I want to replace nan
strings that match with specific strings from the next row in another column by some alternative string.
Here's a sample of data from my dataset where it seizes to work this was an output from pd.to_dict()
I wanted to keep it as it is to replicate my issue.:
for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
if ty == 'nan' and go == 'Climatological':
#ty = ['Drought']
print(ty) #prints nothing as it did not work
CodePudding user response:
Do NOT iterate for this kind of task, this is inefficient!
You can use masks and pandas.where
to apply your filter:
data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')
output:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 nan Technological
9 Explosion Technological
10 nan Meteorological
much cleaner solution
If your objective is to fillna per group, you could use a dictionary and groupy
:
subs = {'Climatological': 'Drought', 'Technological': 'foo'}
(data_pd.replace('nan', pd.NA)
.groupby('group')
.apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)
output:
type group
0 Drought Climatological
1 Drought Climatological
2 Explosion Technological
3 Ground movement Geophysical
4 nan Geophysical
5 Ash fall Geophysical
6 Rockfall Geophysical
7 Ash fall Geophysical
8 foo Technological
9 Explosion Technological
10 nan Meteorological
CodePudding user response:
for index, row in data_pd.iterrows():
if row["type"] == 'nan' and row["group"] == 'Climatological':
data_pd["type"][index] = "Drought"
I tried to match your code as closely as possible to be more concise and user-friendly.