I have a list of Instagram URLS on an excel sheet. I want to extract just the username from this list.
For example the existing cell value is:
['https://www.instagram.com/thestackoverflow/?hl=en']
And from that, I'd like to have thestackoverflow
in the adjacent cell.
The first part of the problem is removing https://www.instagram.com/
which should be simple enough, although I cant find out how after hours of straining the documentation. T
The more complex task would be removing the /?hl=en
(if the link has one) as there are different variables it could be.
However, once the first part is figured out, I think this wouldn't be too much of an issue.
From research I found that Instagram supports 25 languages. These will hopefully be using the same host language parameters as Google which are listed here.
I should be able to make a loop to check if there is a language modifier at the end and remove it.
If anyone could help I'd much appreciate it!
Update:
I tried using urllib.parse
but this didn't work. It doesn't split up the URL in any way. Here is an example of the result:
ParseResult(scheme='', netloc='', path="['https://www.instagram.com/thestackoverflow/']", params='', query='', fragment='')
CodePudding user response:
One of the most straight forward ways of doing this is going to be using .split()
So in your case you could just use:
the_url.split('\')[3]