pandas groupby: generate 0 when an event does not occur-CodePudding

Given a dataframe, I am grouping 'titles' according to their 'vals'.

titles=['MR.', 'MRS.', 'MR.', 'MR.', 'MRS.', 'MR.', 'MISS.', 'MR.', 'MRS.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MR.',
    'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MRS.', 'MR.', 'MASTER.', 'MRS.', 'MR.', 'MRS.', 'MR.', 'MISS.', 
    'MR.', 'MR.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MISS.', 'MR.', 'MR.', 'MR.', 
    'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MISS.', 'MR.',
    'MASTER.', 'MR.', 'MR.', 'MR.', 'MISS.', 'MR.', 'MR.', 'MR.', 'MISS.', 'MASTER.', 'MRS.', 'MISS.', 'MR.', 
    'MR.', 'MRS.', 'MR.', 'MISS.', 'MR.', 'MISS.', 'MR.', 'MR.', 'MRS.', 'MR.', 'MASTER.', 'DON.']


vals=[1,0,1,1,0,0,0,1,0,0,1,1,0,1,0,0,0,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,
     0,1,0,0,0,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0]

d = {'titles': []}
df = pd.DataFrame(data=d)
df['titles']=titles
df['vals']=vals


print(df.groupby([df['vals'] ]['titles'].value_counts())


print(df.groupby([df['vals'] ]['titles'].value_counts().values.tolist())

When I print the last line after converting it to list the output I obtain is the following:

[19, 13, 9, 2, 1, 26, 5, 3, 2]

but I would like to obtain:

[19, 13, 9, 2, 1]
[26, 5, 3, 2]

the second list has 4 items instead of 5, and that is because df['titles]='DON.' never corresponds to 1 on the 'vals' column. How do make sure that such instances will be replaced with the value 0 instead of no value at all ?

CodePudding user response：

Can use:

df.groupby('vals')['titles'].value_counts().unstack().fillna(0).values

Output:

[[ 1.  2.  9. 19. 13.]
 [ 0.  2.  3. 26.  5.]]

CodePudding user response：

You can use list comprehension:

print([i.value_counts().tolist() for _, i in df.groupby("vals")['titles']])

# [[19, 13, 9, 2, 1], [26, 5, 3, 2]]

For your 2nd question you can use pd.crosstab:

print (pd.crosstab(df["vals"], df["titles"]))

titles  DON.  MASTER.  MISS.  MR.  MRS.
vals                                   
0          1        2      9   19    13
1          0        2      3   26     5

CodePudding user response：

Try:

>>> df.groupby('vals')['titles'].agg(lambda x: list(x.value_counts())).tolist()
[[19, 13, 9, 2, 1], [26, 5, 3, 2]]

>>> df.groupby('vals')['titles'] \
      .value_counts() \
      .unstack(fill_value=0) \
      .to_dict(orient='split')['data']

[[1, 2, 9, 19, 13], [0, 2, 3, 26, 5]]

CodePudding user response：

Starting from your command, you could use a small list comprehension:

# your previous command
counts = df.groupby(df['vals'])['titles'].value_counts()

# list by group
[s.values.tolist() for _,s in counts.groupby(level=0)]

output:

[[19, 13,  9,  2,  1], [26,  5,  3,  2]]

getting missing values filled with zero

counts = df.groupby(df['vals'])['titles'].value_counts()

list(counts.unstack(fill_value=0).T.to_dict('list').values())

output:

[[1, 2, 9, 19, 13], [0, 2, 3, 26, 5]]