This is what my dataframe looks like:
Year | State | Var1 | Var2 |
---|---|---|---|
2018 | 1 | 1 | 3 |
2018 | 1 | 2 | Nan |
2018 | 1 | NaN | 1 |
2018 | 2 | NaN | 1 |
2018 | 2 | NaN | 2 |
2018 | 3 | 3 | NaN |
2019 | 1 | 1 | NaN |
2019 | 1 | 3 | NaN |
2019 | 1 | 2 | NaN |
2019 | 1 | NaN | NaN |
2019 | 2 | NaN | NaN |
2019 | 2 | 3 | NaN |
2020 | 1 | 1 | NaN |
2020 | 2 | NaN | 1 |
2020 | 2 | NaN | 3 |
2020 | 3 | 3 | NaN |
2020 | 4 | NaN | NaN |
2020 | 4 | 1 | NaN |
Desired Output
Year 2018 2019 2020
Var1 Num of States w/ non-null 2 2 3
Var2 Num of States w/ non-null 2 0 1
I want to count the number of unique values of the variable State
that have at least one non null response for each variable.
CodePudding user response:
You may check with groupby
with unique
then unstack
out = df.groupby(['Year', 'State']).nunique().unstack().T.fillna(0)
out
Out[71]:
Year 2018 2019 2020
State
Var1 1 2.0 3.0 1.0
2 0.0 1.0 0.0
3 1.0 0.0 1.0
4 0.0 0.0 1.0
Var2 1 3.0 0.0 0.0
2 2.0 0.0 2.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
CodePudding user response:
IIUC you are looking for:
out = pd.concat([
df.dropna(subset='Var1').pivot_table(columns='Year',
values='State',
aggfunc='nunique'),
df.dropna(subset='Var2').pivot_table(columns='Year',
values='State',
aggfunc='nunique')
]).fillna(0).astype(int)
out.index = ['Var1 Num of States w/non-null', 'Var2 Num of states w/non-null']
print(out):
Year 2018 2019 2020
Var1 Num of States w/non-null 2 2 3
Var2 Num of states w/non-null 2 0 1