Home > Software engineering >  How to count the number of appearances for all elements in a list of string by date in pandas datafr
How to count the number of appearances for all elements in a list of string by date in pandas datafr

Time:04-24

I have the following dataframe. The objective is to count the number of appearances for all elements in both "from" and "to" by date.

id  date         from   to      
A   01/01/2020  alice  bob        
A   01/01/2020  alice  carl        
A   01/02/2020  jim    peter
A   01/02/2020  jim    jen
B   01/02/2020  mary   john
B   01/03/2020  john   max
B   01/03/2020  mike   jane
B   01/03/2020  jane   jim
C   01/02/2020  xavier jim
C   01/02/2020  jim    alice
C   01/03/2020  alice  bob
C   01/03/2020  manny  lily
C   01/04/2020  rosa   bob
..     ..        ..     ..

Desired dataframe

name    date     count
alice 01/01/2020   2
alice 01/02/2020   1
alice 01/03/2020   1
bob   01/01/2020   1
bob   01/03/2020   1
bob   01/04/2020   1
carl  01/01/2020   1
peter 01/02/2020   1
jane  01/03/2020   2
jen   01/02/2020   1
jim   01/02/2020   4
jim   01/03/2020   1
john  01/02/2020   1
john  01/03/2020   1
lily  01/03/2020   1
mary  01/02/2020   1
manny 01/03/2020   1
max   01/03/2020   1
mike  01/03/2020   1
rosa  01/04/2020   1
xavier01/02/2020   1

Any help will be appreciated!

CodePudding user response:

You can set_index stack and use groupby value_counts. The rest is just cosmetic changes to get the output in the desired form:

out = (df.set_index('date')[['from','to']].stack()
       .groupby(level='date').value_counts()
       .reset_index(level=0, name='count').sort_index()
       .rename_axis(index=['name']).reset_index())

Output:

      name        date  count
0    alice  01/01/2020      2
1    alice  01/02/2020      1
2    alice  01/03/2020      1
3      bob  01/01/2020      1
4      bob  01/03/2020      1
5      bob  01/04/2020      1
6     carl  01/01/2020      1
7     jane  01/03/2020      2
8      jen  01/02/2020      1
9      jim  01/02/2020      4
10     jim  01/03/2020      1
11    john  01/02/2020      1
12    john  01/03/2020      1
13    lily  01/03/2020      1
14   manny  01/03/2020      1
15    mary  01/02/2020      1
16     max  01/03/2020      1
17    mike  01/03/2020      1
18   peter  01/02/2020      1
19    rosa  01/04/2020      1
20  xavier  01/02/2020      1

CodePudding user response:

Use melt as starting point:

out = (df.melt(['id', 'date'], value_name='name')
         .value_counts(['name', 'date'], sort=False)
         .rename('count').reset_index())
print(out)

# Output
      name        date  count
0    alice  01/01/2020      2
1    alice  01/02/2020      1
2    alice  01/03/2020      1
3      bob  01/01/2020      1
4      bob  01/03/2020      1
5      bob  01/04/2020      1
6     carl  01/01/2020      1
7     jane  01/03/2020      2
8      jen  01/02/2020      1
9      jim  01/02/2020      4
10     jim  01/03/2020      1
11    john  01/02/2020      1
12    john  01/03/2020      1
13    lily  01/03/2020      1
14   manny  01/03/2020      1
15    mary  01/02/2020      1
16     max  01/03/2020      1
17    mike  01/03/2020      1
18   peter  01/02/2020      1
19    rosa  01/04/2020      1
20  xavier  01/02/2020      1
  • Related