Home > Software engineering >  Python pandas How to pick up certain values by internal numbering?
Python pandas How to pick up certain values by internal numbering?

Time:09-14

I have a dataframe that looks like this:

    Answers  all_answers  Score
0       0.0            0     72
1       0.0            0     73
2       0.0            0     74
3       1.0            1      1
4      -1.0            1      2
5       1.0            1      3
6      -1.0            1      4
7       1.0            1      5
8       0.0            0      1
9       0.0            0      2
10     -1.0            1      1
11      0.0            0      1
12      0.0            0      2
13      1.0            1      1
14      0.0            0      1
15      0.0            0      2
16      1.0            1      1

The first column is a signal that the sign has changed in the calculation flow

The second one is I just removed the minus from the first one

The third is an internal account for the second column - how much was one and how much was zero

I want to add a fourth column to it that would show me only those units that went in a row for example 5 times while observing the sign of the first column.

To get something like this

    Answers  all_answers  Score  New
0       0.0            0     72    0
1       0.0            0     73    0
2       0.0            0     74    0
3       1.0            1      1    1
4      -1.0            1      2   -1
5       1.0            1      3    1
6      -1.0            1      4   -1
7       1.0            1      5    1
8       0.0            0      1    0
9       0.0            0      2    0
10     -1.0            1      1    0
11      0.0            0      1    0
12      0.0            0      2    0
13      1.0            1      1    0
14      0.0            0      1    0
15      0.0            0      2    0
16      1.0            1      1    0
17      0.0            0      1    0

Is it possible to do this by Pandas ?

CodePudding user response:

You can use:

# group by consecutive 0/1
g = df['all_answers'].ne(df['all_answers'].shift()).cumsum()

# get size of each group and compare to threshold
m = df.groupby(g)['all_answers'].transform('size').ge(5)

# mask small groups
df['New'] = df['Answers'].where(m, 0)

Output:

    Answers  all_answers  Score  New
0       0.0            0     72  0.0
1       0.0            0     73  0.0
2       0.0            0     74  0.0
3       1.0            1      1  1.0
4      -1.0            1      2 -1.0
5       1.0            1      3  1.0
6      -1.0            1      4 -1.0
7       1.0            1      5  1.0
8       0.0            0      1  0.0
9       0.0            0      2  0.0
10     -1.0            1      1  0.0
11      0.0            0      1  0.0
12      0.0            0      2  0.0
13      1.0            1      1  0.0
14      0.0            0      1  0.0
15      0.0            0      2  0.0
16      1.0            1      1  0.0
  • Related