Home > other >  How to check if all grouped by month data is equal to nan?
How to check if all grouped by month data is equal to nan?

Time:08-18

I have this df:

               DATE  CODE    PP   YEAR_MONTH
9862     1991-01-01  100007  NaN  1991-01
9863     1991-01-02  100007  NaN  1991-01
9864     1991-01-03  100007  NaN  1991-01
9865     1991-01-04  100007  NaN  1991-01
9866     1991-01-05  100007  NaN  1991-01
            ...     ...  ...      ...
10541963 2020-12-27  155217  5.7  2020-12
10541964 2020-12-28  155217  0.0  2020-12
10541965 2020-12-29  155217  0.0  2020-12
10541966 2020-12-30  155217  0.0  2020-12
10541967 2020-12-31  155217  5.7  2020-12

[5347504 rows x 4 columns]

I want to know if for each specific df['CODE'] and df['YEAR_MONTH'] my df['PP'] has only values of nan (that generates the value of true). If in df['PP'] there is at least 1 value that is not nan in a specific df['CODE'] and df['YEAR_MONTH'], it will generate the value of false.

Expected result:

CODE    NAN_MONTH YEAR_MONTH
100007  True      1991-01
100007  False     1991-02
100007  False     1991-03
100007  False     1991-04
100007  False     1991-05
...     ...  ...      ...
155217  True      2020-09
155217  False     2020-10
155217  False     2020-11
155217  False     2020-12
155217  False     2020-12

I tried this code:

yearmonths=[]
nanmonths=[]
for yearmonth, data in df.groupby(['CODE','YEAR_MONTH']):
    
    checknan=data['PP'].isna().all()

    yearmonths=yearmonths.append(yearmonth)
    nanmonths=nanmonths.append(checknan)

But i get this error:

AttributeError: 'NoneType' object has no attribute 'append'

Would you mind to help me?

Thanks in advance.

CodePudding user response:

Create helper column NAN_MONTH and then call GroupBy.all:

out = (df.assign(NAN_MONTH = df['PP'].isna())
         .groupby(['CODE','YEAR_MONTH'])['NAN_MONTH']
         .all()
         .reset_index())

print (out)
     CODE YEAR_MONTH  NAN_MONTH
0  100007    1991-01       True
1  155217    2020-12      False

Or use Series, then is necessary also groiping by Series, not by columns names:

out = (df['PP'].isna()
         .groupby([df['CODE'],df['YEAR_MONTH']])
         .all()
         .reset_index())
  • Related