Working with a piece of software that randomly stores images on its filesystem, so I never know the path to the image or how many directories deep it is.
I want to replace this path, but keep the basename (basename is guaranteed to be last item in path). Exactly like below.
Input
<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>
Output
<img src="http://somedomain.com/myImage.jpg"></img>
I don't think python's string.replace()
is going to get me anywhere, but I am new to regex and don't really have a grasp on what I need to replace the path but keep the basename
CodePudding user response:
As i read from your answer on your question that you can't use external libraries, here is my solution:
test_str = '<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>'
start_str = 'src="'
end_str = '"'
# Getting the src string
index = test_str.index(start_str)
temp_str = test_str[index len(start_str):]
index = temp_str.index(end_str)
temp_str = temp_str[0:index]
#Now remove basename from temp_str and replace the starting test_str with a custom url
index = temp_str.rfind('/')
temp_str = temp_str[:index]
end_str = test_str.replace(temp_str, "https://somedomain.com")
print(end_str)
This code is a little bit longer but it ensures no errors when the tag contains other attributes.
CodePudding user response:
I created a small class to parse <img>
tag using inbuilt html.parser
class HTMLParser
.
from html.parser import HTMLParser
class MyImgParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.img_src = None
def handle_starttag(self, tag, attrs):
if tag == 'img':
for attr in attrs:
if attr[0] == 'src':
self.img_src = attr[1]
def get_img_src(self):
return self.img_src
Then I use this parser like:
myparser = MyImgParser()
myparser.feed('<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>')
img_src = myparser.get_img_src()
which gives random/path/with/different/number/of/dirs/myImage.jpg
.
we can extract base name of file using img_src = os.path.basename(img_src)
which gives us myImage.jpg
. Now we can replace the link like:
my_new_img_link = "https://somedomain.com/" img_src