Home > Software engineering >  Add column with a specific sequence of numbers depending on value
Add column with a specific sequence of numbers depending on value

Time:07-15

I have this dataframe:

df = pd.DataFrame({
    'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    'Condition': [False, False, True, False, False, False, False, False, False, False, True, False]})
df
     ID   Condition
0     1       False
1     1       False
2     1        True
3     1       False
4     1       False
5     1       False
6     1       False
7     1       False
8     1       False
9     1       False
10    1        True
11    1       False

I want to add a new column Sequence with a sequence of numbers. The condition is when the first True appears in the Condition column, the following rows must contain the sequence 1, 2, 3, 1, 2, 3... until another True appears again, at which point the sequence is restarted again. Furthermore, ideally, until the first True appears, the values in the new column should be 0. El resultado final sería:

     ID   Condition  Sequence
0     1       False         0
1     1       False         0
2     1        True         1
3     1       False         2
4     1       False         3
5     1       False         1
6     1       False         2
7     1       False         3
8     1       False         1
9     1       False         2
10    1        True         1
11    1       False         2

I have tried to do it with cumsum and cumcount but I can't find the exact code.

Any suggestion?

CodePudding user response:

Let us do cumsum to identify blocks of rows, then group the dataframe by blocks and use cumcount to create sequential counter, then with some simple maths we can get the output

b = df['Condition'].cumsum()
df['Seq'] = df.groupby(b).cumcount().mod(3).add(1).mask(b < 1, 0)

Explained

Identify blocks/groups of rows using cumsum

b = df['Condition'].cumsum()
print(b)

0     0
1     0
2     1 # -- group 1 start --
3     1
4     1
5     1
6     1
7     1
8     1
9     1 # -- group 1 ended --
10    2
11    2
Name: Condition, dtype: int32

Group the dataframe by the blocks and use cumcount to create a sequential counter per block

c = df.groupby(b).cumcount()
print(c)

0     0
1     1
2     0
3     1
4     2
5     3
6     4
7     5
8     6
9     7
10    0
11    1
dtype: int64

Modulo(%) divide the sequential counter by 3 to create a repeating sequence that repeats every three rows

c = c.mod(3).add(1)
print(c)

0     1
1     2
2     1
3     2
4     3
5     1
6     2
7     3
8     1
9     2
10    1
11    2
dtype: int64

Mask the values in sequence with 0 where the group(b) is < 1

c = c.mask(b < 1, 0)
print(c)

0     0
1     0
2     1
3     2
4     3
5     1
6     2
7     3
8     1
9     2
10    1
11    2

Result

    ID  Condition  Seq
0    1      False    0
1    1      False    0
2    1       True    1
3    1      False    2
4    1      False    3
5    1      False    1
6    1      False    2
7    1      False    3
8    1      False    1
9    1      False    2
10   1       True    1
11   1      False    2

CodePudding user response:

This was the simplest way I could think of doing it

import pandas as pd

df = pd.DataFrame({
    'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    'Condition': [False, False, True, False, False, False, False, False, False, False, True, False]})

conditions = df.Condition.tolist()
sequence = []

buf = 1
seenTrue = False

for condition in conditions:
    #If it's seen a True in the list, this bool is set to True
    if condition or seenTrue:
        seenTrue = True
        #Checking buffer and resetting back to 0 
        if buf%4 == 0 or condition:
            buf = 1
        sequence.append(buf)
        buf  = 1

    #While True has not been seen, all 0s to be appended.
    if not seenTrue:
        sequence.append(0)

df["Sequence"] = sequence

Effectively looping through and then adding the new column in. The buffer is reset whenever it reaches 4 or when a new True is seen, giving you the looping 1,2,3 effect.

CodePudding user response:

The solution I've come up with is just simply looping through the Condition column, adding 0's to the list until you have seen the first True. When you have found a True, you set seen_true to True and set seq_count to 1. After the first True, you keep increasing seq_count, until it's larger then 3 or you see a new True. In both cases, you reset seq_count to 1. This gives you the column you were looking for.

import pandas as pd

df = pd.DataFrame({
    'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    'Condition': [False, False, True, False, False, False, False, False, False, False, True, False]})

l = []
seq_count  = 0
first_true = False

for index, row in df.iterrows():
    con = row["Condition"]
    if con:
        seq_count  = 1
        first_true = True
    elif first_true:
        seq_count  = 1
        if seq_count > 3:
            seq_count = 1
    l.append(seq_count)

df["Sequence"] = l

Output:

    ID  Condition  Sequence
0    1      False         0
1    1      False         0
2    1       True         1
3    1      False         2
4    1      False         3
5    1      False         1
6    1      False         2
7    1      False         3
8    1      False         1
9    1      False         2
10   1       True         1
11   1      False         2
  • Related