Home > Blockchain >  Python Pandas DataFrame: conditional column based on other column values
Python Pandas DataFrame: conditional column based on other column values

Time:11-14

Description of the problem:
I'am trying to simulate a machine whose operation mode "B" occurs if "VALUE" is greater or equal to 5 in the last 3 previous time steps- which means "VALUE">= 5 for at least 3 minutes.The Operation mode "B" keeps to be "B" for the next time steps as long as "VALUE" is greater or equal to 5 and is turned to "A" after at least 3 time steps - which means The Operation mode "B" keeps valid for at the next 3 minutes. After 3 minutes the Operation mode "A" is turned on if "VALUE" is less than 5.

The goal:
I need an approach using python and pandas to identify the the operation mode described by "A" and "B" (column: "statusA/B") based on the values in column "VALUE" and the status "on" and "down" (column: "VALUE<5 --> down, VALUE>=5 --> on").

The conditions have to be considered are as follows:

  • case "A" and "B" depend on each other.
  • case "B" is occurred if at least 3 "on" previously occurred and the actual VALUE is greater or equal to 5.
  • once "B" occurs, the next 3 time steps have to be "B" even if the status is “down” and it keeps to be "B" as long as "on" exists.

What I did try:
I tried multiple approaches applying counter for the cases "down" and "on" and tried to track the status based on the counter values but unfortunately it did not work it properly.

time VALUE VALUE<5 --> down / VALUE>=5 --> on statusA/B
00:00 0 down A
00:01 0 down A
00:02 0 down A
00:03 8 on A
00:04 4 down A
00:05 2 down A
00:06 1 down A
00:07 2 down A
00:08 1 down A
00:08 5 on A
00:09 6 on A
00:10 0 down A
00:11 10 on A
00:12 10 on A
00:13 10 on A
00:14 11 down B
00:15 2 down B
00:16 1 down B
00:17 3 down A
00:18 11 on A
00:19 10 on A
00:20 10 on A
00:21 10 on B
00:22 10 on B
00:23 11 on B
00:24 14 on B
00:25 11 on B

CodePudding user response:

Modified Solution. I edited my solution thanks to a subtle point made by dear mozway:

import pandas as pd

df2['status'] = df2['VALUE'].mask(df2['VALUE'].shift().rolling(3, min_periods=3).min() >= 5, 'B')

m1 = df2['status'].shift().eq('B')
m2 = df2['status'].shift(2).eq('B')


df2['status'] = (df2['status']
                .mask(m1 | m2).fillna('B')
                .astype(str)
                .str.replace(r'\d ', 'A'))

m5 = df2['status'].shift().eq('B')
m6 = df2['status'].shift(2).eq('B')
m3 = df2['status'].eq('A')
m4 = df2.iloc[:, 2].eq('on')

df2['status'] = df2['status'].mask((m5 & m3 & m4) | (m6 & m3 & m4)).fillna('B')


    index  VALUE  ...                                 resulted statusA/B status
0       0      3  ...                                                  A      A
1       1      5  ...                                                  A      A
2       2      2  ...                                                  A      A
3       3      6  ...                                                  A      A
4       4      3  ...                                                  A      A
5       5      1  ...                                                  A      A
6       6      7  ...                                                  A      A
7       7      7  ...                                                  A      A
8       8      2  ...                                                  A      A
9       9      2  ...                                                  A      A
10     10      3  ...                                                  A      A
11     11      6  ...                                                  A      A
12     12      8  ...                                                  A      A
13     13      8  ...                                                  A      A
14     14      7  ...                                                  B      B
15     15      4  ...                                                  B      B
16     16      4  ...                                                  B      B
17     17      6  ...  A(expected is B because is "on" and at least 3...      B
18     18      6  ...  A(expected is B because is "on" and at least 3...      B
19     19      6  ...  A(expected is B because is "on" and at least 3...      B
20     20      7  ...                                                  B      B
21     21      2  ...                                                  B      B
22     22      9  ...                                                  B      B
23     23      8  ...  A(expected is B because "B" keeps a "B" for 3 ...      B
24     24      7  ...  A(expected is B because is "on" and at least 3...      B
25     25      2  ...                                                  B      B
26     26      4  ...  A(expected is B because "B" keeps a "B" for 3 ...      B
27     27      4  ...  A(expected is B because "B" keeps a "B" for 3 ...      B
28     28      1  ...  A(this true because it is down and the 3 time ...      A
29     29      4  ...                                                  A      A
[30 rows x 5 columns]

CodePudding user response:

I use the code with the following "VALUE" column.

import pandas as pd

df['status'] = df['VALUE'].mask(df['VALUE'].shift().rolling(3, min_periods=3).min() >= 5, 'B')

m1 = df['status'].shift().eq('B')
m2 = df['status'].shift(2).eq('B')

df['status'] = (df['status']
                .mask(m1 & m2).fillna('B')
                .astype(str)
                .str.replace(r'\d ', 'A'))

I got this result which does not match with the expected values. I added some comments for the expected values.

Detailed description of the problem: The operation mode "B" can be interpreted as the cooling time of the machine. The machine is warm if it was "on" for at least 3 minutes. If there is time step with a "down" the machine stays warm for next 3 minutes. if the "down" state lasts after the 3 minutes the machine becomes cold.

index VALUE VALUE<5 --> down / VALUE>=5 --> on resulted statusA/B
0 3 down A
1 5 on A
2 2 down A
3 6 on A
4 3 down A
5 1 down A
6 7 on A
7 7 on A
8 2 down A
9 2 down A
10 3 down A
11 6 on A
12 8 on A
13 8 on A
14 7 on B
15 4 down B
16 4 down B
17 6 on A (expected is B because is "on" and at least 3 steps with Bs are previously respected)
18 6 on A (expected is B because is "on" and at least 3 steps with Bs are previously respected)
19 6 on A (expected is B because is "on" and at least 3 steps with Bs are previously respected)
20 7 on B
21 2 down B
22 9 on B
23 8 on A (expected is B because "B" keeps a "B" for 3 time steps starting from index 21)
24 7 on A (expected is B because is "on" and at least 3 steps with Bs are previously respected)
25 2 down B
26 4 down A (expected is B because "B" keeps a "B" for 3 time steps starting from index 25)
27 4 down A (expected is B because "B" keeps a "B" for 3 time steps starting from index 25)
28 1 down A (this true because it is down and the 3 time steps with the "B"s are over
29 4 down A
  • Related