Pandas column contains a series of urls
. I'd like to extract a substring from the url.
MRE code below.
s = pd.Series(['https://url-location/img/xxxyyy_image1.png'])
s.apply(lambda x: x[x.find("/") 1:st.find("_")])
I'd like to extract xxxyyy
and store them into a new column.
CodePudding user response:
You can use
>>> s.str.extract(r'.*/([^_] )')
0
0 xxxyyy
See the regex demo. Details:
.*
- zero or more chars other than line break chars as many as possible/
- a slash([^_] )
- Capturing group 1 (the value captured into this group will be the actual return value ofSeries.str.extract
): one or more chars other than_
char.
CodePudding user response:
Also possible:
s.str.split('/').str[-1].str.split('_').str[0]
# Out[224]: xxxyyy
This works, because .str
allows for the slice annotation.
So .str[-1]
will provide the last element after the split for example.