My dataframe has 4 columns (one dependent variable and 3 independent).
Here's a sample:
My desired output is a contingency table, as follows:
I can only seem to get a contingency table using one independent variable- using the following code (my df is called 'table')
pd.crosstab(index=table['Dvar'],columns=table['Var1'])
I can't seem to be able to add any other variables to this...Is the only way to achieve this to do make a separate contingency table for each var (1 to 3) and then merge/ join them?
CodePudding user response:
This is not a good use case for crosstab
as you already have your contingency table (just not aggregated), rather use a groupby.sum
df = pd.DataFrame([[1,0,0,0],
[1,1,1,0],
[0,1,1,1]], columns=['Var1', 'Var2', 'Var3', 'Dvar'])
out = df.groupby('Dvar', as_index=False).sum()
output:
Dvar Var1 Var2 Var3
0 0 2 1 1
1 1 0 1 1
CodePudding user response:
First of all, contingency table is for showing correlation between features.
If you want to probably see correlation between independent and dependent features, go through this code:
pd.crosstab([table['Var1'],table['Var2'],table['Var3']],
table['Dvar'], margins = False)
But, as you mention, to get your desired output for that use pandas.DataFrame.groupby
statement as:
table.groupby('Dvar').sum()