Pandas count recurring values per week-CodePudding

I have a Dataframe containing dates and ids that are grouped by weeks

df:

date         id    
2022-02-07   1    
             3    
             5    
             4
2022-02-14   2
             1
             3
2022-02-21   9
             10
             1
... 
...
2022-05-16 ....

I want to count for each week how much of the id's are repeating from previous week

For example the desired output for the Dataframe would be:

date        count
2022-02-07  0
2022-02-14  2     # because id 1 and 3 are present in previous week
2022-02-21  1     # because id 1 is present in previous week 
...

I tried grouping the id and counting for each id how many are repeating for each date but it didn't work out as planned.

CodePudding user response：

Solution

s = pd.crosstab(df.date, df.id)
(s.eq(s.shift()) & s.ne(0)).sum(1)

Explained

Create a frequency table with crosstab

>>> pd.crosstab(df.date, df.id)

id          1   2   3   4   5   9   10
date                                  
2022-02-07   1   0   1   1   1   0   0
2022-02-14   1   1   1   0   0   0   0
2022-02-21   1   0   0   0   0   1   1

Compare the values in previous row with the current row, to check the values that are repeating. Also while comparing make sure that the current row has non-zero value

>>> s.eq(s.shift()) & s.ne(0)

id             1      2      3      4      5      9      10
date                                                       
2022-02-07  False  False  False  False  False  False  False
2022-02-14   True  False   True  False  False  False  False
2022-02-21   True  False  False  False  False  False  False

Sum the non-zero values along the column axis to get the counts of repeating ids:

>>> .sum(1)

date
2022-02-07    0
2022-02-14    2
2022-02-21    1
dtype: int64

CodePudding user response：

You can try group the original dataframe and merge id column to list. Then calculate the normal element by comparing with the shift.

out = df.groupby('date')['id'].apply(list)

lst = [[j for j in js if j in ks] for js, ks in zip(out, out.shift(fill_value=[[]])) ]

df = pd.DataFrame({'date': out.index,
                   'count': map(len, lst)})

print(df)

         date  count
0  2022-02-07      0
1  2022-02-14      2
2  2022-02-21      1