Calculate and enumerate how many times similar groups of values have occured in dataframe-CodePudding

I want to create groups of similar data in rows and put numbers in another column that tells me the number of times this unique combination of data has occurred. I'm new to using pandas, and I was wondering if there is an easy way to calculate this.

As an example, I have these rows:

PID	fileName	line	revisit
P1	f1.py	10	1
P1	f1.py	10	1
P1	f1.py	10	1
P1	f1.py	10	1
P2	f2.py	2	1
P2	f2.py	2	1
P1	f1.py	10	2
P1	f1.py	10	2
P1	f1.py	10	2
P2	f2.py	2	2
P2	f2.py	2	2
P1	f1.py	10	3
P1	f1.py	10	3

The important information for me is on the PID, fileName, and line. For the consecutive rows of data that have similar values for these columns, I want a number to be assigned in revisit. And the next time these unique sets of values are seen again, revisit should be incremented.

I don't have a code example because I'm unsure how to do this using pandas functions.

CodePudding user response：

try this:

def func(g: pd.DataFrame):
    revisit = g.index.to_series().diff().ne(1).cumsum()
    return revisit

grouped = df.groupby(['PID', 'fileName'], as_index=False)
df['revisit'] = grouped.apply(func, as_index=False).droplevel(0)
print(df)

>>>

    PID fileName    line    revisit
0   P1  f1.py       10      1
1   P1  f1.py       10      1
2   P1  f1.py       10      1
3   P1  f1.py       10      1
4   P2  f2.py       2       1
5   P2  f2.py       2       1
6   P1  f1.py       10      2
7   P1  f1.py       10      2
8   P1  f1.py       10      2
9   P2  f2.py       2       2
10  P2  f2.py       2       2
11  P1  f1.py       10      3
12  P1  f1.py       10      3

CodePudding user response：

Let us do transform with factorize

df['revisit'] = df.PID.ne(df.PID.shift()).cumsum().groupby(df.PID).transform(lambda x : x.factorize()[0] 1)
df
Out[189]: 
   PID fileName  line  revisit
0   P1    f1.py    10        1
1   P1    f1.py    10        1
2   P1    f1.py    10        1
3   P1    f1.py    10        1
4   P2    f2.py     2        1
5   P2    f2.py     2        1
6   P1    f1.py    10        2
7   P1    f1.py    10        2
8   P1    f1.py    10        2
9   P2    f2.py     2        2
10  P2    f2.py     2        2
11  P1    f1.py    10        3
12  P1    f1.py    10        3