Home > Software engineering >  Regex to update URLs after a content migration
Regex to update URLs after a content migration

Time:05-06

I recently moved some old content to a new site and updated some URL structures. I need to do a find-replace on the entire database to update some old links. This would be easy if I knew regex, but I don't so hoping this is easy for the SO guru's.

Note: This is PHP regex.

Find:

https://api.floodmagazine.com/{number}/{string}/
Result:
https://api.floodmagazine.com/789/foo-bar/
https://api.floodmagazine.com/12345/foo-bar-1/

Replace with:

https://floodmagazine.com/$1/$2/
Result:
https://floodmagazine.com/789/foo-bar/
https://floodmagazine.com/12345/foo-bar-1/

It's not as easy as just doing a search for the sub-domain (api.floodmagazine.com) because there are URL's in the DB that need that sub-domain to remain (images for example). So the /{number/{string}/ part is an important way to find only the URL's that need to be changed.

I just need the regex part, I'm using WP Migrate for the database updating part.

Thanks for the help!

Screenshot for my failed attempt at regex find and replace.

CodePudding user response:

https:\/\/api.floodmagazine.com\/([0-9] )\/([A-z0-9._ -] )\/? that should work. On regex101 you have to escape / so I kept that here. That may not be true in your tooling.

You can omit the last ? if you don’t want the trailing slash to be optional.

CodePudding user response:

This should grab all the URLs you describe :

(https://floodmagazine.com)(\/)[0-9]*(\/)[A-z-0-9]*(\/)

CodePudding user response:

To avoid URL error du to WordPress inconsistency you can use this PHP code generated with regex101

$re = '/https?:\/\/([^\/] )\/([^\/] )\/([^\/] )\/?/m';
$str = 'https://api.floodmagazine.com/789/foo-bar/';
$subst = 'https://floodmagazine.com/$2/$3/';

$result = preg_replace($re, $subst, $str);

this regex catch domain, id and post name. Can catch special case like non HTTPS, special char ... and return the result like expected in your exemple

  • Related