Home > Enterprise >  Replace img src path but keep basename
Replace img src path but keep basename

Time:12-07

Working with a piece of software that randomly stores images on its filesystem, so I never know the path to the image or how many directories deep it is.

I want to replace this path, but keep the basename (basename is guaranteed to be last item in path). Exactly like below.

Input

<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>

Output

<img src="http://somedomain.com/myImage.jpg"></img>

I don't think python's string.replace() is going to get me anywhere, but I am new to regex and don't really have a grasp on what I need to replace the path but keep the basename

CodePudding user response:

As i read from your answer on your question that you can't use external libraries, here is my solution:

test_str = '<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>'
 
start_str = 'src="'
end_str = '"'
 
# Getting the src string
index = test_str.index(start_str)
temp_str = test_str[index   len(start_str):]
index = temp_str.index(end_str)
temp_str = temp_str[0:index]

#Now remove basename from temp_str and replace the starting test_str with a custom url
index = temp_str.rfind('/')
temp_str = temp_str[:index]

end_str = test_str.replace(temp_str, "https://somedomain.com")

print(end_str)

This code is a little bit longer but it ensures no errors when the tag contains other attributes.

CodePudding user response:

I created a small class to parse <img> tag using inbuilt html.parser class HTMLParser.

from html.parser import HTMLParser


class MyImgParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.img_src = None

    def handle_starttag(self, tag, attrs):
        if tag == 'img':
            for attr in attrs:
                if attr[0] == 'src':
                    self.img_src = attr[1]

    def get_img_src(self):
        return self.img_src

Then I use this parser like:

myparser = MyImgParser()
myparser.feed('<img src="random/path/with/different/number/of/dirs/myImage.jpg"></img>')
img_src = myparser.get_img_src()

which gives random/path/with/different/number/of/dirs/myImage.jpg.

we can extract base name of file using img_src = os.path.basename(img_src) which gives us myImage.jpg. Now we can replace the link like:

my_new_img_link = "https://somedomain.com/"   img_src
  • Related