I would like to translate this R code into Python's pandas:
gg <- c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b')
val <- c(-2, 2, -1, 2, 3, 3, -7, 5, 6, 2, 8)
df = data.table(gg=gg, val=val)
df[, gg_count := seq(.N), by = gg]
df[, counter := gg_count - na.locf(ifelse(val <= 0, gg_count, NA), na.rm=F), by=gg]
print(df)
Output:
gg val gg_count counter???
a -2 1 0
a 2 2 1
b -1 1 0
a 2 3 2
b 3 2 1
b 3 3 2
a -7 4 0
a 5 5 1
a 6 6 2
a 2 7 3
b 8 4 3
Basically, I have columns "gg" and "val" and need to get to the column "counter". In R, I used also helper column "gg_count".
"counter" column, as you can see in the example, counts how many consecutive rows have "val" greater than zero. When negative number pops up, we reset the counter and start counting again. Everything needs to be done on a group level using column "gg".
CodePudding user response:
You can use:
g = df['val'].le(0).groupby(df['gg']).cumsum() # get groups between negative numbers
df['val'].gt(0).groupby([df['gg'], g]).cumsum()
Output:
0 0
1 1
2 0
3 2
4 1
5 2
6 0
7 1
8 2
9 3
10 3
Name: val, dtype: int64