Home > Mobile >  Count visitor with same id but different name and show it
Count visitor with same id but different name and show it

Time:11-02

I have a dataframe:

df1 = pd.DataFrame({'id': ['1','2','2','3','3','4','4'],
                    'name': ['James','Jim','jimy','Daniel','Dane','Ash','Ash'],
                    'event': ['Basket','Soccer','Soccer','Basket','Soccer','Basket','Soccer']})

I want to count unique values of id but with the name, the result I except are:

id name         count   
1  James          1       
2  Jim, jimy      2
3  Daniel, Dane   2
4  Ash            2 

I try to group by id and name but it doesn't count as i expected

CodePudding user response:

You could try:

df1.groupby('id').agg(
    name=('name', lambda x: ', '.join(x.unique())),
    count=('name', 'count')
)

We are basically grouping by id and then joining the unique names to a comma separated list!

CodePudding user response:

Here is a solution:

groups = df1[["id", "name"]].groupby("id")
a = groups.agg(lambda x: ", ".join( set(x) ))
b = groups.size().rename("count")
c = pd.concat([a,b], axis=1)

I'm not an expert when it comes to pandas but I thought I might as well post my solution because I think that it's straightforward and readable.

In your example, the groupby is done on the id column and not by id and name. The name column you see in your expected DataFrame is the result of an aggregation done after a groupby. Here, it is obvious that the groupby was done on the id column.

My solution is maybe not the most straightforward but I still find it to be more readable:

  1. Create a groupby object groups by grouping by id
  2. Create a DataFrame a from groups by aggregating it using commas (you also need to remove the duplicates using set(...) ): lambda x: ", ".join( set(x) )

The DataFrame a will thus have the following data:

            name
id
1          James
2      Jim, jimy
3   Daniel, Dane
4            Ash
  1. Create another DataFrame b by computing the size of each groups in groups : groups.size() (you should also rename your column)
id
1    1
2    2
3    2
4    2
Name: count, dtype: int64
  1. Concat a and b horizontally and you get what you wanted
            name  count
id
1          James      1
2      Jim, jimy      2
3   Daniel, Dane      2
4            Ash      2
  • Related