I have a joined dataframe containing daily precipitation between 1979 and 2020 from two sources:
precip_1 precip
time
1973-01-04 0.0 NaN
1973-01-07 0.0 NaN
1973-01-08 0.0 NaN
1973-01-12 0.0 NaN
1973-01-17 0.0 NaN
... ...
2020-12-27 NaN 0.434374
2020-12-28 NaN 0.923621
2020-12-29 NaN 0.012130
2020-12-30 NaN 0.064116
2020-12-31 NaN 2.041316
[16978 rows x 2 columns]
What I'm trying to do is set any zero values in the months 6-8 to NaN, but leave any zero values outside of those months. I understand that I can use the following function to do this:
df1 = df.replace(0, nan)
But what I can't seem to figure out is how to select only certain months within the dataframe. Any Ideas?
Many Thanks
CodePudding user response:
You could try converting your time column into a DateTime object to easily compare the months. After doing this you can use the apply function alongside a lambda function to compare the date and change them as you described. Hope this helps!
import pandas as pd
import datetime
data = {'time':['2020-01-18','2020-02-29','2020-6-28','2020-7-27'],'precip_1':[0.434374,0.923621,0,0]}
df = pd.DataFrame(data)
df['time'] = df['time'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df['precip_1'] = df.apply(lambda row: 'nan' if (row.time.month > 5 and row.time.month < 8) and row.precip_1 == 0 else row.precip_1,axis=1)
CodePudding user response:
Replace works very well, but it doesn't play well with logic from other columns, so this actually looks like a better case for the .loc
accessor.
df = pd.DataFrame(
[
['2020-06-05', 1], # > 0 value, don't change
['2020-06-05', 0], # 0 value, change to nan
['2020-12-05', 1],
['2020-12-05', 0], # wrong month, don't change
],
columns=['date', 'precip']
)
df.date = pd.to_datetime(df.date)
df.loc[(df['date'].dt.month.between(6, 8)) & (df['precip'] == 0), 'precip'] = np.nan
Output:
date precip
0 2020-06-05 1.0
1 2020-06-05 NaN
2 2020-12-05 1.0
3 2020-12-05 0.0