Given this dataframe,
import pandas as pd
d = {'a': ['john', 'mary','john','john','mary','john'], 'b': [1,2,3,1,1,2],
'c': [0.7, 0.3,0.9,0.4,1.0,0.2],'d': [1,0,0,1,0,1]}
df = pd.DataFrame(data=d)
The following line prints out how many times df['a']=john
and df['a']=mary
correspond to df['b']=1,2,3
print(df.groupby('a')['b'].value_counts())
What I want to do now is to print out how many times df['a']=john
and df['a']=mary
corresponds to df['d']=1
or =0
when df['b']=1,2,3
.
for instance, when df['a']=john
and df['b']=1
, df['d']
is always equal to 1, and when df['a']=john
and df['b']=3
, df['d']=0
etc...
The following line prints out all zeroes and I am not sure why:
print((df['d'])[(df.groupby('a')['b'].value_counts())])
CodePudding user response:
You can modify your code to accommodate multiple columns in groupby
:
print(df.groupby(['a', 'b'])['d'].value_counts())
# a b d
# john 1 1 2
# 2 1 1
# 3 0 1
# mary 1 0 1
# 2 0 1
CodePudding user response:
Just do value_counts
out = df.value_counts(['a','b','d'])
a b d
john 1 1 2
2 1 1
3 0 1
mary 1 0 1
2 0 1
dtype: int64