Hello I have my code that copy the html from external url and echo it on my page. Some of the HTMLs have links and/or picure SRC inside. I will need some help to truncate them (from absolute url to relative url inside $data )
For example : inside html there is href
<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
or SRC
<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
I would like to keep only subdirectory.
/products/score-vs-ibd/z
/Filters/MinDUp1.gif
Maybe with preg_replace , but im not familiar with Regular expressions.
This is my original code that works very well, but now im stuck truncating the links.
<?php
$post_tags = get_the_tags();
if ( $post_tags ) {
$tag = $post_tags[0]->name;
}
$html= file_get_contents('https://www.trade-ideas.com/ticky/ticky.html?symbol='. "$tag");
$start = strpos($html,'<div ');
$end = strpos($html,'<!-- /span -->',$start);
$data= substr($html,$start,$end-$start);
echo $data ;
?>
CodePudding user response:
Here is the code:
function getUrlPath($url) {
$re = '/(?:https?:\/\/)?(?:[^?\/\s] [?\/])(.*)/';
preg_match($re, $url, $matches);
return $matches[1];
}
Example: getUrlPaths("http://myassets.com:80/files/images/image.gif")
returns files/images/image.gif
CodePudding user response:
You can locate all the URLs in the html string with a regex using preg_match_all()
.
The regex:
'/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i'
will capture both the entire URL and the path/query string for every occurrence of ="http://domain/path"
or ='https://domain/path?query'
(http/https, single or double quotes, with/without query string).
Then you can just use str_replace()
to update the html string.
<?php
$html = '<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
<img src=\'https://static.trade-ideas.com/Filters/MinDUp1.gif?param=value\'>';
$pattern = '/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i';
$urls = [];
preg_match_all($pattern, $html, $urls);
//var_dump($urls);
foreach($urls[1] as $i => $uri){
$html = str_replace($uri, $urls[2][$i], $html);
}
echo $html;
Note, this will change all absolute URLs enclosed in quotes immediately following an =
.