Automate the removal of part of a cells text with Openpyxl-CodePudding

I have a list of Instagram URLS on an excel sheet. I want to extract just the username from this list.

For example the existing cell value is:

['https://www.instagram.com/thestackoverflow/?hl=en']

And from that, I'd like to have thestackoverflow in the adjacent cell.

The first part of the problem is removing https://www.instagram.com/ which should be simple enough, although I cant find out how after hours of straining the documentation. T

The more complex task would be removing the /?hl=en (if the link has one) as there are different variables it could be.

However, once the first part is figured out, I think this wouldn't be too much of an issue.

From research I found that Instagram supports 25 languages. These will hopefully be using the same host language parameters as Google which are listed here.

I should be able to make a loop to check if there is a language modifier at the end and remove it.

If anyone could help I'd much appreciate it!

Update:

I tried using urllib.parse but this didn't work. It doesn't split up the URL in any way. Here is an example of the result:

ParseResult(scheme='', netloc='', path="['https://www.instagram.com/thestackoverflow/']", params='', query='', fragment='')

CodePudding user response：

One of the most straight forward ways of doing this is going to be using .split()

So in your case you could just use:

the_url.split('\')[3]