Home > Mobile >  Add domain to <img> src attribute value if a relative path
Add domain to <img> src attribute value if a relative path

Time:12-28

I have a text variable which contains multiple images with a relative or absolute path. I need to check if the src attribute starts with http or https then ignore it, but in case it starts with / or something like abc/ then prepend a base url.

I tried like below:

<?php
$html = <<<HTML
<img src="docs/relative/url/img.jpg" />
<img src="/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />
HTML;

$base = 'https://example.com/';

$pattern = "/<img src=\"[^http|https]([^\"]*)\"/";
$replace = "<img src=\"" . $base . "\${1}\"";
echo $text = preg_replace($pattern, $replace, $html);

My output is:

<img src="https://example.com/ocs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

Issue here: I got 99% result correct, but when the src attribute started with something like docs/ then first letter of it cut off. (please check first img src in output)

Output I needed is:

<img src="https://example.com/docs/relative/url/img.jpg" /><!--check this and compare with current result, you will get the difference -->
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

Could any one help me to rectify it.

CodePudding user response:

The following pattern will seek src attributes that do not start with http or https. Then for relative paths that begin with a forward slash, the leading slash will be removed before prepending the $base string to the src value.

Code: (Demo)

$base = 'https://example.com/';
echo preg_replace('~ src="(?!http)\K/?~', $base, $html);

Output:

<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

Breakdown:

~           #starting pattern delimiter
 src="      #match space, s, r, c, =, then "
(?!http)    #only continue matching if not https or http
\K          #forget any previously matched characters so they are not destroyed by the replacement string
/?          #optionally match a forward slash
~           #ending pattern delimiter

As for your pattern, /<img src=\"[^http|https]([^\"]*)\"/:

  1. [^http|https] actually means "match a single character that is not from this list: |, h, t, p, and s. It could be simplified to [^|hpst] because the order of the listed characters in the "negated character class" is irrelevant and duplicating characters is meaningless. So you see, [^...] is not how you say "a string starts with something or somethingelse".
  2. Capturing all remaining characters in a substring until the next double quote with the intent to use it again in the replacement is unnecessary. This is why I use \K to pinpoint where $base should be injected instead of ([^\"]*).

Furthermore, I always recommend the stability of a DOM parser when dealing with a valid HTML document. You can use DOMDocument with XPath to target the qualifying elements and modify the src attributes without regex.

Code: (Demo)

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//img[not(starts-with(@src, 'http'))]") as $node) {
    $node->setAttribute('src', $base . ltrim($node->getAttribute('src'), '/'));
}
echo $dom->saveHTML();

A related answer: https://stackoverflow.com/a/48837947/2943403

  • Related