Is there an efficient way to compute column values in Pandas using values from previous rows based o-CodePudding

Consider looping through my DataFrame:

import pandas as pd

df = pd.DataFrame({
    'Price': [1000, 1000, 1000, 2000, 2000, 2000, 2000, 1400, 1400],
    'Count': [0, 0, 0, 0, 0, 0, 0, 0, 0]
})

for idx in df.index:
    if df['Price'].iloc[idx] > 1500:
        if idx > 0:
            df['Count'].iloc[idx] = df['Count'].iloc[idx - 1]   1

Resulting in:

	Price	Count
0	1000	0
1	1000	0
2	1000	0
3	2000	1
4	2000	2
5	2000	3
6	2000	4
7	1400	0
8	1400	0

Is there a more efficient way to do this?

CodePudding user response：

Use mask to hide values below 1500 and use cumsum to create the counter:

df['Count'] = df.mask(df['Price'] <= 1500)['Count'].add(1).cumsum().fillna(0).astype(int)
print(df)

# Output:
   Price  Count
0   1000      0
1   1000      0
2   1000      0
3   2000      1
4   2000      2
5   2000      3
6   2000      4
7   1400      0
8   1400      0

CodePudding user response：

Create pseudo-groups using Series.cumsum, then use groupby.cumcount to generate the within-group counts:

groups = df.Price.le(1500).cumsum()
df['Count'] = df.Price.gt(1500).groupby(groups).cumcount()

#    Price  Count
# 0   1000      0
# 1   1000      0
# 2   1000      0
# 3   2000      1
# 4   2000      2
# 5   2000      3
# 6   2000      4
# 7   1400      0
# 8   1400      0