I need some input from you. The idea is that I would like to see how long (in rows) it takes before you can see
- a new value in column
SUB_B1
, and - a new value in
SUB_B2
i.e, how many steps is there between
SUB_A1
andSUB B1
, and- between
SUB A2
andSUB B2
I have structured the data something like this: (I sort the index in descending order by the results column. After that I separate index B and A and place them in new columns)
df.sort_values(['A','result'], ascending=[True,False]).set_index(['A','B'])
result | SUB_A1 | SUB_A2 | SUB_B1 | SUB_B2 | ||
---|---|---|---|---|---|---|
A | B | |||||
10_125 | 10_173 | 0.903257 | 10 | 125 | 10 | 173 |
10_332 | 0.847333 | 10 | 125 | 10 | 332 | |
10_243 | 0.842802 | 10 | 125 | 10 | 243 | |
10_522 | 0.836335 | 10 | 125 | 10 | 522 | |
58_941 | 0.810760 | 10 | 125 | 58 | 941 | |
... | ... | ... | ... | ... | ... | |
10_173 | 10_125 | 0.903257 | 10 | 173 | 10 | 125 |
58_941 | 0.847333 | 10 | 173 | 58 | 941 | |
1_941 | 0.842802 | 10 | 173 | 1 | 941 | |
96_512 | 0.836335 | 10 | 173 | 96 | 512 | |
10_513 | 0.810760 | 10 | 173 | 10 | 513 |
This is what I have done so far: (edit: I think I need to iterate over values[]
However, I havent manage to loop beyond the first rows yet...)
def func(group):
if group.SUB_A1.values[0] == group.SUB_B1.values[0]:
group.R1.values[0] = 1
else:
group.R1.values[0] = 0
if group.SUB_A1.values[0] == group.SUB_B1.values[1] and group.R1.values[0] == 1:
group.R1.values[1] = 2
else:
group.R1.values[1] = 0
df['R1'] = 0
df.groupby('A').apply(func)
Expected outcome:
result | SUB_B1 | SUB_B2 | R1 | R2 | ||
---|---|---|---|---|---|---|
A | B | |||||
10_125 | 10_173 | 0.903257 | 10 | 173 | 1 | 0 |
10_332 | 0.847333 | 10 | 332 | 2 | 0 | |
10_243 | 0.842802 | 10 | 243 | 3 | 0 | |
10_522 | 0.836335 | 10 | 522 | 4 | 0 | |
58_941 | 0.810760 | 58 | 941 | 0 | 0 | |
... | ... | ... | ... | ... | ... |
CodePudding user response:
Are you looking for something like this:
Sample dataframe:
df = pd.DataFrame({
"A": [1, -1, -2, 3, 3, 3, 3, -3, 6, 6],
"B": [1, 2, 3, 3, 3, 3, 4, 6, 6, 6]
})
A B
0 1 1
1 -1 2
2 -2 3
3 3 3
4 3 3
5 3 3
6 3 4
7 -3 6
8 6 6
9 6 6
Now this
equal = df.A == df.B
df["R"] = equal.groupby(equal.diff().fillna(True).cumsum()).cumsum()
leads to
A B R
0 1 1 1
1 -1 2 0
2 -2 3 0
3 3 3 1
4 3 3 2
5 3 3 3
6 3 4 0
7 -3 6 0
8 6 6 1
9 6 6 2
CodePudding user response:
Try using pandas.DataFrame.iterrows and pandas.DataFrame.shift.
You can iterate through the dataframe and compare current row with the previous one, then add some condition:
df['SUB_A2_last'] = df['SUB_A2'].shift()
count = 0
#Fill column with zeros
df['count_series'] = 0
for index, row in df.iterrows():
subA = row['sub_A2']
subA_last = row['sub_A2_last']
if subA == subA_last:
count = 1
else:
count = 0
df.loc[index, 'count_series'] = count
Then repeat for B column. It is possible to get a better aproach using pandas.DataFrame.apply and a custom function.