python pandas substring based on columns values-CodePudding

Given the following df:

data = {'Description':  ['with lemon', 'lemon', 'and orange', 'orange'],
        'Start': ['6', '1', '5', '1'],
       'Length': ['5', '5', '6', '6']}
df = pd.DataFrame(data)
print (df)

I would like to substring the "Description" based on what is specified in the other columns as start and length, here the expected output:

data = {'Description':  ['with lemon', 'lemon', 'and orange', 'orange'],
        'Start': ['6', '1', '5', '1'],
       'Length': ['5', '5', '6', '6'],
       'Res':  ['lemon', 'lemon', 'orange', 'orange']}
df = pd.DataFrame(data)
print (df)

Is there a way to make it dynamic or another compact way?

df['Res'] = df['Description'].str[1:2]

CodePudding user response：

You need to loop, a list comprehension will be the most efficient (python ≥3.8 due to the walrus operator, thanks @I'mahdi):

df['Res'] = [s[(start:=int(a)-1):start int(b)] for (s,a,b)
             in zip(df['Description'], df['Start'], df['Length'])]

Or using pandas for the conversion (thanks @DaniMesejo):

df['Res'] = [s[a:a b] for (s,a,b) in 
             zip(df['Description'],
                 df['Start'].astype(int)-1,
                 df['Length'].astype(int))]

output:

  Description Start Length     Res
0  with lemon     6      5   lemon
1       lemon     1      5   lemon
2  and orange     5      6  orange
3      orange     1      6  orange

CodePudding user response：

Given that the fruit name of interest always seems to be the final word in the description column, you might be able to use a regex extract approach here.

data["Res"] = data["Description"].str.extract(r'(\w )$')

CodePudding user response：

You can use .map to cycle through the Series, do split(' ') to separate the words if there is space and get the last word in the list [-1].

df['RES'] = df['Description'].map(lambda x: x.split(' ')[-1])