Home > Mobile >  Create string column that goes up a value, everytime the condition is met
Create string column that goes up a value, everytime the condition is met

Time:09-01

I have the following series

s = pd.Series({'01','02','03,'01','02','03','01','02','03'})

And the following list

l = ['A','B','C']

What I want to do is every time we reach the 01 string value, I advance one value in my list index and start filling up the apply result with the 'B' value.

s = pd.Series({'A','A','A','B','B','B','C','C','C'})

I know i should use apply but i am having trouble formating the condition

s.apply(increment)

def increment(s):
   l = ['A','B','C']
   count = 0
   if s == '01':
     count  = 1
     # Something wenting array here
     return l[count]

CodePudding user response:

Here are two ways to achieve this:

Option 1

  1. Convert pd.Series to float.
  2. Get diff and apply ne to determine the breakpoints between the consecutive sequences.
  3. Apply cumsum, but starting at 0 (with sub). At this stage, we will have a pd.Series with values as [0, 0, 0, 1, 1, 1, 2, 2, 2].
  4. Finally, we convert lst to a dict, with the indices as keys, so that we can apply map.

We can put this is a one-liner:

import pandas as pd

s = pd.Series(['01','02','03','01','02','03','01','02','03'])
lst = ['A','B','C']

result = (s.astype(float).diff().ne(1)).cumsum().sub(1)\
    .map({k: v for k, v in enumerate(lst)})
                                                  
print(result)

0    A
1    A
2    A
3    B
4    B
5    B
6    C
7    C
8    C

Option 2

  1. Use groupby with groups to find all the indices ("labels") for value 01.
  2. Next, turn lst into pd.Series with found indices as index and feed this to a map apply to a pd.Series with s.index. This will get us "A" at 0, "B" at 3, "C" at 6.
  3. Finally, apply ffill.
# Int64Index([0, 3, 6], dtype='int64')
indices_one = s.groupby(s).groups['01']

result2 = pd.Series(s.index).map(pd.Series(lst, index=indices_one)).ffill()

result2.equals(result)
# True

CodePudding user response:

I think this is a simple approach using apply

s = pd.Series(['01','02','03','01','02','03','01','02','03'])

l = ['A','B','C']

def return_val(s):
    global l
    val = l[0]
    if s == '03':
        l = l[1:]
    return val

s.apply(return_val)

output:

0    A
1    A
2    A
3    B
4    B
5    B
6    C
7    C
8    C
dtype: object

If you want to avoid using globals you might use reduce and partial from functools, although this approach might not be that clear:

from functools import reduce, partial

def return_val(prev_val, s, where_to_add: list):
    l = ["",'A','B','C']
    if s == '01':
        where = l.index(prev_val)
        prev_val = l[where 1]
    where_to_add.append(prev_val)
    return prev_val

my_list = []
return_val_partial = partial(return_val, where_to_add=my_list)
reduce(return_val_partial , s.values, "")
result = pd.Series(my_list)
print(result)

CodePudding user response:

You can do:

out = pd.Series([np.nan]*len(s))
out.loc[s.loc[s=='01'].index] = l
out = out.ffill()

print(out):

0    A
1    A
2    A
3    B
4    B
5    B
6    C
7    C
8    C

Explanation:

  • Initialize a series of NaN with length equal to original series
  • Locate index in original series s where value is 01 and populate those locations with those from your list l = [A, B, C]
  • Then ffill
  • Related