Home > Enterprise >  pandas counts data after groupby
pandas counts data after groupby

Time:09-15

Background

The following table is a record of the use of a product use of the table, the start time is the time recorded by touching the use of the switch; Use_ID is the product use record code, each one means a new use record; Mode is the product of the two modes of use, there are three cases: only open mode 0, only open mode 1, mode 0 and mode 1 are open. The following table is an example: Use_ID for 00001 has three use records means that in the process of 00001 this use, first opened mode 0, and then opened mode 1, and then clicked mode 1 again (can be understood as the customer mistakenly touch operation), and then for example 00002 has three use records means that the user in the day of November 3, more than eight, first opened mode 0, and then opened mode 1 twice. Please note: normal user use should be such as 00003 or 00004 or 00005, etc., only at that point in time to open a mode or both open (there is no misuse). Now I want to do a count of how many times mode 0, mode 1 and modes 0 and 1 are both on, respectively, during each use. Similar to a usage record in which there are two mode 0, or two mode 1 are considered misuse, not included in the statistics.

Start time Use_ID Mode
2021/09/08 13:14:12 00001 0
2021/09/08 13:15:32 00001 1
2021/09/08 13:15:36 00001 1
2021/11/03 08:45:12 00002 0
2021/10/06 11:11:12 00003 1
2021/11/03 08:54:02 00002 1
2021/04/03 21:23:13 00004 0
2021/03/23 11:31:41 00005 0
2021/03/23 11:32:52 00005 1
2021/11/03 08:55:11 00002 1
2021/07/13 22:55:31 00006 1
2021/07/13 22:56:02 00006 1
2021/08/15 06:02:03 00007 0
2021/01/06 17:19:01 00008 0
2021/01/06 17:19:08 00008 0
2021/06/02 13:11:37 00009 1
2021/12/11 10:00:06 00010 0

My Solution

results = df.groupby(['Use_ID'])
for key, value in results:
    print(value)

I want to groupby Use_ID method and then do conditional statistics on 'Mode' in Value. If both open mode 0&1 exist for 'Mode' value under a Use_ID, or if 'Mode' value is only 0 or only 1, then it is only 0 or only 1

I don't know how to do next, please help!

Expected Output

Mode1 Mode0 Both
4 3 3

CodePudding user response:

As you have two modes, you can use groupby.mean to identify the correct use:

df.groupby('Use_ID')['Mode'].mean()

Any incorrect use will be a number different from 0/1.

Output:

Use_ID
1     0.666667
2     0.666667
3     1.000000
4     0.000000
5     0.500000
6     1.000000
7     0.000000
8     0.000000
9     1.000000
10    0.000000
Name: Mode, dtype: float64

For the counts:

out = (df
  .groupby('Use_ID')['Mode']
  .mean()
  .map({0: 'Mode0', 1: 'Mode1'})
  .fillna('both')
  .value_counts()
 )

Output:

Mode0    4
both     3
Mode1    3
Name: Mode, dtype: int64
  • Related