Home > Mobile >  next item of list inside dataframe
next item of list inside dataframe

Time:10-08

I have a dataframe that has a column where each row has a list. I want to get the next element after the value I am looking for (in another column).

For example: Let's say I am looking for 'b':

|lists    |next_element|
|---------|------------|
|[a,b,c,d]| c          | #(c is the next value after b)
|[c,b,a,e]| a          | #(a is the next value after b)
|[a,e,f,b]| []         | #(empty, because there is no next value after b)

*All lists have the element. There are no lists without the value I am looking for

Thank you

CodePudding user response:

Try writing a function and use apply.

value = 'b'

def get_next(x):
    get_len = len(x)-1
    for i in x:
        if value.lower() == i.lower():
            curr_idx = x.index(i)
            if curr_idx == get_len:
                return []
            else:
                return x[curr_idx 1]

df["next_element"] = df["lists"].apply(get_next)
df
Out[649]: 
          lists next_element
0  [a, b, c, d]            c
1  [c, b, a, e]            a
2  [a, e, f, b]           []

CodePudding user response:

First observation, since you want the next element of a list of string elements, the expected data type should be a string for that column, and not a list.

So, instead of the next_element columns as [c, a, []] its better to use [c, a, None]

Secondly, you should try avoiding apply methods directly over series and instead utilize the str methods that pandas provides for series which is a vectorized way of solving such problems super fast.


With the above in mind, let's try this completely vectorized one-liner -

element = 'b'

df['next_element'] = df.lists.str.join('').str.split(element).str[-1].str[0]
          lists next_element
0  [a, b, c, d]            c
1  [c, b, a, e]            a
2  [a, e, f, b]          NaN
  1. First I combine each row as a single string [a,b,c,d]->'abcd`
  2. Next I split this by 'b' to get substrings
  3. I pick the last element from this list and finally the first element from that, for each row, using str functions which are vectorized over each row.

Read more about pandas.Series.str methods on official documentation/tutorial here

  • Related