Cutting a string after the last occurrence of certain sign-CodePudding

Can you please help me how to disentangle the following issue. I have a column in pandas df called "names" that contains links to webpages. I need to create a variable called "total categories" that will contain the parts of the link that appears after the last appearance of "/" sign. Example:

names
https://www1.abc.com/aaa/72566-finance
https://www1.abc.com/aaa1/725-z2
https://www1.abc.com/aaa2/75-z3

total categories
72566-finance
725-z2
75-z3

I tried this code:

def find_index(x):
    return x.rindex('/')

data_pd['total categories'] = data_pd['names'].apply(find_index)

I receive the following error:

AttributeError: 'float' object has no attribute 'rindex'

CodePudding user response：

Use str.extract with the r'/([^/] )$' regex:

df['total categories'] = df['names'].str.extract(r'/([^/] )$')

output:

                                    names total categories
0  https://www1.abc.com/aaa/72566-finance    72566-finance
1        https://www1.abc.com/aaa1/725-z2           725-z2
2         https://www1.abc.com/aaa2/75-z3            75-z3

regex demo and description:

/       # match a literal /
(       # start capturing
[^/]    # one or more non-/ characters
)       # end capturing
$       # end of string

CodePudding user response：

If you have these set up as columns in a pandas DataFrame, you can do the following:

df['total categories'] = df['names'].str.split('/').str[-1]

This will split the string based on the passed delimiter, '/', and then take the last element of the resulting splits.