Home > Enterprise >  Python - Start loop at row 'n' in a dataframe?
Python - Start loop at row 'n' in a dataframe?

Time:10-10

I have this dataframe:

a = [0,0,5,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0]
b = [0,0,0,0,250,350,500,0,0,0,0,0,0,125,70,95,0,0]

df = pd.DataFrame(columns=['a', 'b'])
df = pd.DataFrame.assign(df, a=a, b=b)

df
    a   b
0   0   0
1   0   0
2   5   0
3   0   0
4   0   250
5   0   350
6   0   500
7   0   0
8   0   0
9   7   0
10  0   0
11  0   0
12  0   0
13  0   125
14  0   70
15  0   95
16  0   0
17  0   0

I wanted to record the first value from column B, following each iteration through column A. I was looking for this result:

5
250
7
125

My first attempt is this Loop below. I tried to extract the row index, so I could pass it to the next for loop, to start the loop at n index, but it's not quite what I expected.

for item in df.a:
    if item > 0:
        print(item)
        index = df.iterrows()
        print(index)
        
        for i in df.b:
            if i > 0:
                print(i)
                break

which yields:

5
<generator object DataFrame.iterrows at 0x000002C654B0EF20>
250
7
<generator object DataFrame.iterrows at 0x000002C654B01C80>
250

Advice on how to approach this is much appreciated!

CodePudding user response:

Don't loop. You can mask the zeros then group the column b by blocks in column a and aggregate with first

s = df[df != 0]
s['b'].groupby(s['a'].ffill()).first()

a
5.0    250.0
7.0    125.0
Name: b, dtype: float64

CodePudding user response:

Another possible solution:

df1 = df.mask(df.eq(0)).dropna(how='all')
df1.assign(b = df1['b'].shift(-1)).dropna()

Output:

     a      b
2  5.0  250.0
9  7.0  125.0
  • Related