Create A count Features right-CodePudding

I have a table like that:

PC	USER	Period
A	U1	2020/01
A	U2	2020/02
A	U3	2020/03
A	U4	2020/04
B	U1	2020/01
B	U2	2020/02
B	U2	2020/03

I need to create a "feature count" to check how many time "PC" found and "user" connected

now for a machine learning algorithm what is the best way to do that? or even a combination of them

for every row show the max connection:

PC	USER	Period	Max Con Per PC	Max Con Per USER
A	U1	2020/01	4	2
A	U2	2020/02	4	3
A	U3	2020/03	4	1
A	U4	2020/04	4	1
B	U1	2020/01	3	2
B	U2	2020/02	3	3
B	U2	2020/03	3	3

for every row show the COUNT OF the connection:

PC	USER	Period	Max Con Per PC	Max Con Per USER
A	U1	2020/01	1	1
A	U2	2020/02	2	1
A	U3	2020/03	3	1
A	U4	2020/04	4	1
B	U1	2020/01	1	2
B	U2	2020/02	2	2
B	U2	2020/03	3	3

CodePudding user response：

You can use groupby then transform and cumcount:

# Use lazy groups
gpc = df.groupby('PC')['PC']
gus = df.groupby('USER')['Period']

df['Max Con Per PC'] = gpc.transform('size')
df['Count Con Per PC'] = gpc.cumcount()   1
df['Max Con Per USER'] = gus.transform('size')
df['Count Con Per USER'] = gus.cumcount()   1

Output:

>>> df
  PC USER   Period  Max Con Per PC  Count Con Per PC  Max Con Per USER  Count Con Per USER
0  A   U1  2020/01               4                 1                 2                   1
1  A   U2  2020/02               4                 2                 3                   1
2  A   U3  2020/03               4                 3                 1                   1
3  A   U4  2020/04               4                 4                 1                   1
4  B   U1  2020/01               3                 1                 2                   2
5  B   U2  2020/02               3                 2                 3                   2
6  B   U2  2020/03               3                 3                 3                   3