My derived column is a substring of another column but the new string must be extracted at varying positions. In the code below I have done this using a lambda. However, this is slow. Is it possible to achieve the correct result using str.slice or is there another fast method?
import pandas as pd
df = pd.DataFrame ( {'st_col1':['aa-b', 'aaa-b']} )
df['index_dash'] = df['st_col1'].str.find ('-')
# gives wrong answer at index 1
df['res_wrong'] = df['st_col1'].str.slice (3)
# what I want to do :
df['res_cant_do'] = df['st_col1'].str.slice ( df['index_dash'] )
# slow solution
# naively invoking the built in python string slicing ... aStr[ start: ]
# ... accessing two columns from every row in turn
df['slow_sol'] = df.apply (lambda x: x['st_col1'] [ 1 x['index_dash']:], axis=1 )
So can this be sped up ideally using str.slice or via another method?
CodePudding user response:
From what I understand you want to get the last value after the "-" in st_col1 and pass that to a single column for that just use split
df['slow_sol'] = df['st_col1'].str.split('-').str[-1]
No need to identify the index, and them slicing it again on the given index dash. This will surely be more efficient then what you are doing, and cut a lot of steps