Can you please help me how to disentangle the following issue. I have a column in pandas df called "names" that contains links to webpages. I need to create a variable called "total categories" that will contain the parts of the link that appears after the last appearance of "/" sign. Example:
names
https://www1.abc.com/aaa/72566-finance
https://www1.abc.com/aaa1/725-z2
https://www1.abc.com/aaa2/75-z3
total categories
72566-finance
725-z2
75-z3
I tried this code:
def find_index(x):
return x.rindex('/')
data_pd['total categories'] = data_pd['names'].apply(find_index)
I receive the following error:
AttributeError: 'float' object has no attribute 'rindex'
CodePudding user response:
Use str.extract
with the r'/([^/] )$'
regex:
df['total categories'] = df['names'].str.extract(r'/([^/] )$')
output:
names total categories
0 https://www1.abc.com/aaa/72566-finance 72566-finance
1 https://www1.abc.com/aaa1/725-z2 725-z2
2 https://www1.abc.com/aaa2/75-z3 75-z3
regex demo and description:
/ # match a literal /
( # start capturing
[^/] # one or more non-/ characters
) # end capturing
$ # end of string
CodePudding user response:
If you have these set up as columns in a pandas DataFrame, you can do the following:
df['total categories'] = df['names'].str.split('/').str[-1]
This will split the string based on the passed delimiter, '/'
, and then take the last element of the resulting splits.