I have this df:
DATE CODE PP YEAR_MONTH
9862 1991-01-01 100007 NaN 1991-01
9863 1991-01-02 100007 NaN 1991-01
9864 1991-01-03 100007 NaN 1991-01
9865 1991-01-04 100007 NaN 1991-01
9866 1991-01-05 100007 NaN 1991-01
... ... ... ...
10541963 2020-12-27 155217 5.7 2020-12
10541964 2020-12-28 155217 0.0 2020-12
10541965 2020-12-29 155217 0.0 2020-12
10541966 2020-12-30 155217 0.0 2020-12
10541967 2020-12-31 155217 5.7 2020-12
[5347504 rows x 4 columns]
I want to know if for each specific df['CODE'] and df['YEAR_MONTH'] my df['PP'] has only values of nan (that generates the value of true). If in df['PP'] there is at least 1 value that is not nan in a specific df['CODE'] and df['YEAR_MONTH'], it will generate the value of false.
Expected result:
CODE NAN_MONTH YEAR_MONTH
100007 True 1991-01
100007 False 1991-02
100007 False 1991-03
100007 False 1991-04
100007 False 1991-05
... ... ... ...
155217 True 2020-09
155217 False 2020-10
155217 False 2020-11
155217 False 2020-12
155217 False 2020-12
I tried this code:
yearmonths=[]
nanmonths=[]
for yearmonth, data in df.groupby(['CODE','YEAR_MONTH']):
checknan=data['PP'].isna().all()
yearmonths=yearmonths.append(yearmonth)
nanmonths=nanmonths.append(checknan)
But i get this error:
AttributeError: 'NoneType' object has no attribute 'append'
Would you mind to help me?
Thanks in advance.
CodePudding user response:
Create helper column NAN_MONTH
and then call GroupBy.all
:
out = (df.assign(NAN_MONTH = df['PP'].isna())
.groupby(['CODE','YEAR_MONTH'])['NAN_MONTH']
.all()
.reset_index())
print (out)
CODE YEAR_MONTH NAN_MONTH
0 100007 1991-01 True
1 155217 2020-12 False
Or use Series
, then is necessary also groiping by Series, not by columns names:
out = (df['PP'].isna()
.groupby([df['CODE'],df['YEAR_MONTH']])
.all()
.reset_index())