Home > Mobile >  Count until condition is reached in Pandas
Count until condition is reached in Pandas

Time:04-06

I need some input from you. The idea is that I would like to see how long (in rows) it takes before you can see

  1. a new value in column SUB_B1, and
  2. a new value in SUB_B2

i.e, how many steps is there between

  1. SUB_A1 and SUB B1, and
  2. between SUB A2 and SUB B2

I have structured the data something like this: (I sort the index in descending order by the results column. After that I separate index B and A and place them in new columns)

df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])
result SUB_A1 SUB_A2 SUB_B1 SUB_B2
A B
10_125 10_173 0.903257 10 125 10 173
10_332 0.847333 10 125 10 332
10_243 0.842802 10 125 10 243
10_522 0.836335 10 125 10 522
58_941 0.810760 10 125 58 941
... ... ... ... ... ...
10_173 10_125 0.903257 10 173 10 125
58_941 0.847333 10 173 58 941
1_941 0.842802 10 173 1 941
96_512 0.836335 10 173 96 512
10_513 0.810760 10 173 10 513

This is what I have done so far: (edit: I think I need to iterate over values[] However, I havent manage to loop beyond the first rows yet...)


def func(group):
        if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
            group.R1.values[0] = 1
        else:
            group.R1.values[0] = 0
        if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
            group.R1.values[1] = 2
        else:
            group.R1.values[1] = 0 

df['R1'] = 0
df.groupby('A').apply(func)

Expected outcome:

result SUB_B1 SUB_B2 R1 R2
A B
10_125 10_173 0.903257 10 173 1 0
10_332 0.847333 10 332 2 0
10_243 0.842802 10 243 3
10_522 0.836335 10 522 4 0
58_941 0.810760 58 941 0 0
... ... ... ... ... ...

CodePudding user response:

Are you looking for something like this:

Sample dataframe:

df = pd.DataFrame({
    "A": [1, -1, -2, 3, 3, 3, 3, -3, 6, 6],
    "B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]
})
    A   B
0   1   1
1  -1   2
2  -2   3
3   3   3
4   3   3
5   3   3
6   3   4
7  -3   6
8   6   6
9   6   6

Now this

equal = df.A == df.B
df["R"] = equal.groupby(equal.diff().fillna(True).cumsum()).cumsum()

leads to

    A   B  R
0   1   1  1
1  -1   2  0
2  -2   3  0
3   3   3  1
4   3   3  2
5   3   3  3
6   3   4  0
7  -3   6  0
8   6   6  1
9   6   6  2

CodePudding user response:

Try using pandas.DataFrame.iterrows and pandas.DataFrame.shift.

You can iterate through the dataframe and compare current row with the previous one, then add some condition:

df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
    subA = row['sub_A2']
    subA_last = row['sub_A2_last']
    if subA == subA_last:
        count  = 1
    else:
        count = 0
    df.loc[index, 'count_series'] = count

Then repeat for B column. It is possible to get a better aproach using pandas.DataFrame.apply and a custom function.

  • Related