Home > other >  Pandas Using Series.str.slice Most Efficiently with row varying parameters
Pandas Using Series.str.slice Most Efficiently with row varying parameters

Time:09-29

My derived column is a substring of another column but the new string must be extracted at varying positions. In the code below I have done this using a lambda. However, this is slow. Is it possible to achieve the correct result using str.slice or is there another fast method?

import pandas as pd
df = pd.DataFrame  ( {'st_col1':['aa-b', 'aaa-b']}  )
df['index_dash'] = df['st_col1'].str.find ('-')

# gives wrong answer at index 1
df['res_wrong'] =   df['st_col1'].str.slice (3)

# what I want to do :
df['res_cant_do'] = df['st_col1'].str.slice ( df['index_dash']  )

# slow solution
# naively invoking the built in python string slicing ...  aStr[ start: ]
# ... accessing two columns from every row in turn
df['slow_sol'] = df.apply (lambda x: x['st_col1'] [ 1  x['index_dash']:], axis=1 )

So can this be sped up ideally using str.slice or via another method?

CodePudding user response:

From what I understand you want to get the last value after the "-" in st_col1 and pass that to a single column for that just use split

df['slow_sol'] = df['st_col1'].str.split('-').str[-1]

No need to identify the index, and them slicing it again on the given index dash. This will surely be more efficient then what you are doing, and cut a lot of steps

  • Related