Home > Mobile >  using pandas two perform Cross tab between one column and two others that share the same enumerated
using pandas two perform Cross tab between one column and two others that share the same enumerated

Time:02-13

I have table with

|Majors|First Minor|Second Minor|
|Philosophy|English|Null|
|Math|Philosophy|English|
|English|Theology|Political Science
|Philosophy|Theology|English|

The possible values of the "First Minor" and "Second Minor" columns are the same (English, Philosophy, Math, etc)

I want to see the correlation between "Major" and the "Minors" belonging to those majors. I don't care if the minor is a first or second minor. So based on the above sample I want an output that looks something like:

|Major|Philosophy|Math|English
First Minor||||     
Philosophy|0|1|0|
|Math|0|0|0|
|English|2|1|0|
Political Science|0|0|1|
Theology|1|0|1|
Null|1|0|0

The following code generates the desired result for just the "First Minor" or the "Second Minor":

ct = pd.crosstab(newdf.MajorInterMinor, newdf["First Minor"])

But I want to get the results for both columns ("First Minor" and "Second Minor") and I can't figure out how to get the combined result.

Any help/advice is much appreciated!

CodePudding user response:

melt the Minors, then apply crosstab:

df2 = df.melt(id_vars='Majors', value_name='Minors')

pd.crosstab(df2['Majors'], df2['Minors'])

Output:

Minors      English  Null  Philosophy  Political Science  Theology
Majors                                                            
English           0     0           0                  1         1
Math              1     0           1                  0         0
Philosophy        2     1           0                  0         1

CodePudding user response:

pd.melt to collapse FirstMinor, SecondMinor into one column and make new df

 df2 = pd.melt(df,id_vars=['Majors'], value_vars=['FirstMinor','SecondMinor'],var_name='myVarname', value_name='FirstMinor_SecondMinor')

Slice new df to get_dummies on Majors. Join this back to the df. Groupby and sum

pd.DataFrame(df2.loc[:,'FirstMinor_SecondMinor']).join(pd.get_dummies(df2.loc[:,'Majors'])).groupby('FirstMinor_SecondMinor').agg(sum)

   

                          English  Math  Philosophy
FirstMinor_SecondMinor                           
English                       0     1           2
Null                          0     0           1
Philosophy                    0     1           0
PoliticalScience              1     0           0
Theology                      1     0           1
  • Related