I have a dataframe:
df1 = pd.DataFrame({'id': ['1','2','2','3','3','4','4'],
'name': ['James','Jim','jimy','Daniel','Dane','Ash','Ash'],
'event': ['Basket','Soccer','Soccer','Basket','Soccer','Basket','Soccer']})
I want to count unique values of id but with the name, the result I except are:
id name count
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2
I try to group by id and name but it doesn't count as i expected
CodePudding user response:
You could try:
df1.groupby('id').agg(
name=('name', lambda x: ', '.join(x.unique())),
count=('name', 'count')
)
We are basically grouping by id
and then joining the unique names
to a comma separated list!
CodePudding user response:
Here is a solution:
groups = df1[["id", "name"]].groupby("id")
a = groups.agg(lambda x: ", ".join( set(x) ))
b = groups.size().rename("count")
c = pd.concat([a,b], axis=1)
I'm not an expert when it comes to pandas but I thought I might as well post my solution because I think that it's straightforward and readable.
In your example, the groupby is done on the id
column and not by id
and name
. The name
column you see in your expected DataFrame is the result of an aggregation done after a groupby.
Here, it is obvious that the groupby was done on the id
column.
My solution is maybe not the most straightforward but I still find it to be more readable:
- Create a groupby object
groups
by grouping byid
- Create a DataFrame
a
fromgroups
by aggregating it using commas (you also need to remove the duplicates usingset(...)
):lambda x: ", ".join( set(x) )
The DataFrame a
will thus have the following data:
name
id
1 James
2 Jim, jimy
3 Daniel, Dane
4 Ash
- Create another DataFrame
b
by computing the size of each groups ingroups
:groups.size()
(you should also rename your column)
id
1 1
2 2
3 2
4 2
Name: count, dtype: int64
- Concat
a
andb
horizontally and you get what you wanted
name count
id
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2