I have some dataset which looks like [3,4,5,-5,4,5,6,3,2-6,6] I want to create a dataset that will always have 0 for indexes which match first sequence of positive numbers from dataset 1, and 1 for indexes which remain.
So for a = [3,4,5,-5,4,5,6,3,2-6,6] it should be
b = [0,0,0, 1,1,1,1,1,1,1]
How can produce b from a if I use pandas and python ?
CodePudding user response:
Since you tagged pandas, here is a solution using a Series
:
import pandas as pd
s = pd.Series([3, 4, 5, -5, 4, 5, 6, 3, 2 - 6, 6])
# find the first index that is greater than zero
idx = (s > 0).idxmin()
# using the index set all the values before as 0, otherwise 1
res = pd.Series(s.index >= idx, dtype=int)
print(res)
Output
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
dtype: int64
If you prefer a one-liner:
res = pd.Series(s.index >= (s > 0).idxmin(), dtype=int)
CodePudding user response:
You can use a cummax
on the boolean series:
s = pd.Series([3, 4, 5, -5, 4, 5, 6, 3, 2 - 6, 6])
out = s.lt(0).cummax().astype(int)
Output:
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
dtype: int64
If you are really working with lists, then pandas is not needed and numpy should be more efficient:
import numpy as np
a = [3,4,5,-5,4,5,6,3,2-6,6]
b = np.maximum.accumulate(np.array(a)<0).astype(int).tolist()
Output: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
And if the list is small, pure python should be preferred:
from itertools import accumulate
b = list(accumulate((int(x<0) for x in a), max))
Output: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]