Home > Enterprise >  Count the number of column values (number of unique values of column) that have at least one non nul
Count the number of column values (number of unique values of column) that have at least one non nul

Time:08-30

This is what my dataframe looks like:

Year State Var1 Var2
2018 1 1 3
2018 1 2 Nan
2018 1 NaN 1
2018 2 NaN 1
2018 2 NaN 2
2018 3 3 NaN
2019 1 1 NaN
2019 1 3 NaN
2019 1 2 NaN
2019 1 NaN NaN
2019 2 NaN NaN
2019 2 3 NaN
2020 1 1 NaN
2020 2 NaN 1
2020 2 NaN 3
2020 3 3 NaN
2020 4 NaN NaN
2020 4 1 NaN

Desired Output

                            Year     2018     2019     2020
 Var1   Num of States w/ non-null     2        2        3
 Var2   Num of States w/ non-null     2        0        1

I want to count the number of unique values of the variable State that have at least one non null response for each variable.

CodePudding user response:

You may check with groupby with unique then unstack

out = df.groupby(['Year', 'State']).nunique().unstack().T.fillna(0)
out
Out[71]: 
Year        2018  2019  2020
     State                  
Var1 1       2.0   3.0   1.0
     2       0.0   1.0   0.0
     3       1.0   0.0   1.0
     4       0.0   0.0   1.0
Var2 1       3.0   0.0   0.0
     2       2.0   0.0   2.0
     3       0.0   0.0   0.0
     4       0.0   0.0   0.0

CodePudding user response:

IIUC you are looking for:

out = pd.concat([
    df.dropna(subset='Var1').pivot_table(columns='Year',
                                         values='State',
                                         aggfunc='nunique'),
    df.dropna(subset='Var2').pivot_table(columns='Year',
                                         values='State',
                                         aggfunc='nunique')
]).fillna(0).astype(int)
out.index = ['Var1 Num of States w/non-null', 'Var2 Num of states w/non-null']

print(out):

Year                           2018  2019  2020
Var1 Num of States w/non-null     2     2     3
Var2 Num of states w/non-null     2     0     1
  • Related