Given a dataframe, I am grouping 'titles' according to their 'vals'.
titles=['MR.', 'MRS.', 'MR.', 'MR.', 'MRS.', 'MR.', 'MISS.', 'MR.', 'MRS.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MR.',
'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MRS.', 'MR.', 'MASTER.', 'MRS.', 'MR.', 'MRS.', 'MR.', 'MISS.',
'MR.', 'MR.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MISS.', 'MR.', 'MR.', 'MR.',
'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MR.', 'MRS.', 'MRS.', 'MR.', 'MR.', 'MISS.', 'MISS.', 'MR.',
'MASTER.', 'MR.', 'MR.', 'MR.', 'MISS.', 'MR.', 'MR.', 'MR.', 'MISS.', 'MASTER.', 'MRS.', 'MISS.', 'MR.',
'MR.', 'MRS.', 'MR.', 'MISS.', 'MR.', 'MISS.', 'MR.', 'MR.', 'MRS.', 'MR.', 'MASTER.', 'DON.']
vals=[1,0,1,1,0,0,0,1,0,0,1,1,0,1,0,0,0,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,
0,1,0,0,0,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0]
d = {'titles': []}
df = pd.DataFrame(data=d)
df['titles']=titles
df['vals']=vals
print(df.groupby([df['vals'] ]['titles'].value_counts())
print(df.groupby([df['vals'] ]['titles'].value_counts().values.tolist())
When I print the last line after converting it to list the output I obtain is the following:
[19, 13, 9, 2, 1, 26, 5, 3, 2]
but I would like to obtain:
[19, 13, 9, 2, 1]
[26, 5, 3, 2]
the second list has 4 items instead of 5, and that is because df['titles]='DON.' never corresponds to 1 on the 'vals' column. How do make sure that such instances will be replaced with the value 0 instead of no value at all ?
CodePudding user response:
Can use:
df.groupby('vals')['titles'].value_counts().unstack().fillna(0).values
Output:
[[ 1. 2. 9. 19. 13.]
[ 0. 2. 3. 26. 5.]]
CodePudding user response:
You can use list comprehension:
print([i.value_counts().tolist() for _, i in df.groupby("vals")['titles']])
# [[19, 13, 9, 2, 1], [26, 5, 3, 2]]
For your 2nd question you can use pd.crosstab
:
print (pd.crosstab(df["vals"], df["titles"]))
titles DON. MASTER. MISS. MR. MRS.
vals
0 1 2 9 19 13
1 0 2 3 26 5
CodePudding user response:
Try:
>>> df.groupby('vals')['titles'].agg(lambda x: list(x.value_counts())).tolist()
[[19, 13, 9, 2, 1], [26, 5, 3, 2]]
Or
>>> df.groupby('vals')['titles'] \
.value_counts() \
.unstack(fill_value=0) \
.to_dict(orient='split')['data']
[[1, 2, 9, 19, 13], [0, 2, 3, 26, 5]]
CodePudding user response:
Starting from your command, you could use a small list comprehension:
# your previous command
counts = df.groupby(df['vals'])['titles'].value_counts()
# list by group
[s.values.tolist() for _,s in counts.groupby(level=0)]
output:
[[19, 13, 9, 2, 1], [26, 5, 3, 2]]
getting missing values filled with zero
counts = df.groupby(df['vals'])['titles'].value_counts()
list(counts.unstack(fill_value=0).T.to_dict('list').values())
output:
[[1, 2, 9, 19, 13], [0, 2, 3, 26, 5]]