Home > Software design >  Convert zero values to NaN for specified rows within Dataframe
Convert zero values to NaN for specified rows within Dataframe

Time:07-05

I have a joined dataframe containing daily precipitation between 1979 and 2020 from two sources:

          precip_1   precip
time                     
1973-01-04  0.0       NaN
1973-01-07  0.0       NaN
1973-01-08  0.0       NaN
1973-01-12  0.0       NaN
1973-01-17  0.0       NaN
        ...       ...
2020-12-27  NaN  0.434374
2020-12-28  NaN  0.923621
2020-12-29  NaN  0.012130
2020-12-30  NaN  0.064116
2020-12-31  NaN  2.041316

[16978 rows x 2 columns]

What I'm trying to do is set any zero values in the months 6-8 to NaN, but leave any zero values outside of those months. I understand that I can use the following function to do this:

df1 = df.replace(0, nan)

But what I can't seem to figure out is how to select only certain months within the dataframe. Any Ideas?

Many Thanks

CodePudding user response:

You could try converting your time column into a DateTime object to easily compare the months. After doing this you can use the apply function alongside a lambda function to compare the date and change them as you described. Hope this helps!

import pandas as pd
import datetime
data = {'time':['2020-01-18','2020-02-29','2020-6-28','2020-7-27'],'precip_1':[0.434374,0.923621,0,0]}
df = pd.DataFrame(data)
df['time'] = df['time'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df['precip_1'] = df.apply(lambda row: 'nan' if (row.time.month > 5 and row.time.month < 8) and row.precip_1 == 0 else row.precip_1,axis=1)

CodePudding user response:

Replace works very well, but it doesn't play well with logic from other columns, so this actually looks like a better case for the .loc accessor.

df = pd.DataFrame(
    [
        ['2020-06-05', 1],  # > 0 value, don't change
        ['2020-06-05', 0],  # 0 value, change to nan
        ['2020-12-05', 1],
        ['2020-12-05', 0],  # wrong month, don't change
    ],
    columns=['date', 'precip']
)
df.date = pd.to_datetime(df.date)

df.loc[(df['date'].dt.month.between(6, 8)) & (df['precip'] == 0), 'precip'] = np.nan

Output:

        date  precip
0 2020-06-05     1.0
1 2020-06-05     NaN
2 2020-12-05     1.0
3 2020-12-05     0.0
  • Related