Look at this code:
result=pd.DataFrame(df.groupby(['col1','col2'])['col3'].count())
It basically does what I want to with one minor issue: I want the result to have the full cartesian product of all occurring values of col1 and col2 as index. Of course, my command takes only those combinations of col1 and col2 into account which actually are present in df
. I'd like to fill up the result with 0 entries at the non occuring elements of the cartesian product. How do I do that?
Example: Let's say df
contians the values
('A',1,'Bob')
('A',1,'James')
('A',2,'Bond')
('B',3,'Alice')
('C',1,'Klaus')
('C',1,'Peter')
Then the result is
('A',1)
: 2('A',2)
: 1('B',3)
: 1('C',1)
: 2
I want it to be filled up with
('A',3)
: 0('B',1)
: 0('B',2)
: 0('C',2)
: 0('C',3)
: 0
CodePudding user response:
is this what you're looking for?
result = df.groupby(["col1", "col2"])["col3"].count().unstack(fill_value=0).stack()
result :
col1 col2
A 1 2
2 1
3 0
B 1 0
2 0
3 1
C 1 2
2 0
3 0