slicing strings in Series by a different Series of Ints-CodePudding

Say we have this dict as a dataframe with two columns:

data = {
  "slice_by" : [2, 2, 1]
  "string_to_slice" : ["one", "two", "three"]
}

First line works just fine, second one doesn't:

df["string_to_slice"].str[:1])
df["string_to_slice"].str[:df["slice_by"])

Output:

0        ne
1        wo
2        hree
Name: string_to_slice, Length: 3, dtype: object
0       NaN
1       NaN
2       NaN
Name: string_to_slice, Length: 3, dtype: float64

What would be the appropiate way to do this? I'm sure I could make up something with df.iterrows() but that's probably not the efficient way.

CodePudding user response：

here is one way to do it, by using apply

df.apply(lambda x: x['string_to_slice'][:x['slice_by']], axis=1)

0    on
1    tw
2     t

CodePudding user response：

I am assuming you want str[slice_by:] and not str[:slice_by]. With that assumption you can do:

def slice_string(string_to_slice, slice_by):
    return string_to_slice[slice_by:]

np_slice_string = np.vectorize(slice_string)

out = np_slice_string(df['string_to_slice'], df['slice_by'])

print(out):

['e' 'o' 'hree']