I have a dataframe that has a column where each row has a list. I want to get the next element after the value I am looking for (in another column).
For example: Let's say I am looking for 'b':
|lists |next_element|
|---------|------------|
|[a,b,c,d]| c | #(c is the next value after b)
|[c,b,a,e]| a | #(a is the next value after b)
|[a,e,f,b]| [] | #(empty, because there is no next value after b)
*All lists have the element. There are no lists without the value I am looking for
Thank you
CodePudding user response:
Try writing a function and use apply
.
value = 'b'
def get_next(x):
get_len = len(x)-1
for i in x:
if value.lower() == i.lower():
curr_idx = x.index(i)
if curr_idx == get_len:
return []
else:
return x[curr_idx 1]
df["next_element"] = df["lists"].apply(get_next)
df
Out[649]:
lists next_element
0 [a, b, c, d] c
1 [c, b, a, e] a
2 [a, e, f, b] []
CodePudding user response:
First observation, since you want the next element of a list of string elements, the expected data type should be a string for that column, and not a list.
So, instead of the next_element
columns as [c, a, []]
its better to use [c, a, None]
Secondly, you should try avoiding apply
methods directly over series and instead utilize the str
methods that pandas provides for series which is a vectorized way of solving such problems super fast.
With the above in mind, let's try this completely vectorized one-liner -
element = 'b'
df['next_element'] = df.lists.str.join('').str.split(element).str[-1].str[0]
lists next_element
0 [a, b, c, d] c
1 [c, b, a, e] a
2 [a, e, f, b] NaN
- First I combine each row as a single string [a,b,c,d]->'abcd`
- Next I split this by 'b' to get substrings
- I pick the last element from this list and finally the first element from that, for each row, using
str
functions which are vectorized over each row.
Read more about pandas.Series.str
methods on official documentation/tutorial here