I have this dataframe:
a = [0,0,5,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0]
b = [0,0,0,0,250,350,500,0,0,0,0,0,0,125,70,95,0,0]
df = pd.DataFrame(columns=['a', 'b'])
df = pd.DataFrame.assign(df, a=a, b=b)
df
a b
0 0 0
1 0 0
2 5 0
3 0 0
4 0 250
5 0 350
6 0 500
7 0 0
8 0 0
9 7 0
10 0 0
11 0 0
12 0 0
13 0 125
14 0 70
15 0 95
16 0 0
17 0 0
I wanted to record the first value from column B, following each iteration through column A. I was looking for this result:
5
250
7
125
My first attempt is this Loop below. I tried to extract the row index, so I could pass it to the next for loop, to start the loop at n index, but it's not quite what I expected.
for item in df.a:
if item > 0:
print(item)
index = df.iterrows()
print(index)
for i in df.b:
if i > 0:
print(i)
break
which yields:
5
<generator object DataFrame.iterrows at 0x000002C654B0EF20>
250
7
<generator object DataFrame.iterrows at 0x000002C654B01C80>
250
Advice on how to approach this is much appreciated!
CodePudding user response:
Don't loop. You can mask
the zeros then group the column b
by blocks in column a
and aggregate with first
s = df[df != 0]
s['b'].groupby(s['a'].ffill()).first()
a
5.0 250.0
7.0 125.0
Name: b, dtype: float64
CodePudding user response:
Another possible solution:
df1 = df.mask(df.eq(0)).dropna(how='all')
df1.assign(b = df1['b'].shift(-1)).dropna()
Output:
a b
2 5.0 250.0
9 7.0 125.0