I have an Excel spreadsheet (Mac OS) with approximately 500 rows and 50 columns. In all 500 rows, but only half of the columns, are URLs that I am looking to change in bulk.
The domain name and first two category names (paths) are the same. The third path is different but will always contain two letters only. And the final/last text path before the numbers/integers is different.
I am looking to remove the entire URL, including the .html extension, but keep the unique numbers/integers.
https://www.website.com/category/product/oh/brody/3302.html
https://www.website.com/category/product/in/greene/6648.html
https://www.website.com/category/product/ca/ronald/9858.html
https://www.website.com/category/product/nj/brick/2464.html
https://www.website.com/category/product/tx/allen/1104.html`
- Desired result
3302
6648
9858
2462
1104
I am open to doing this in Excel or Google Sheets, as well as multiple steps to accomplish this task
- Find "https://www.website.com/category/product" and replace with nothing
- Find ".html" and replace with nothing
- This still leaves thousands of almost-complete results:
/oh/brody/3302
/in/greene/6648
/ca/ronald/9858
/nj/brick/2464
/tx/allen/1104
I want to be left with only this:
3302
6648
9858
2462
1104
CodePudding user response:
Using "=Right, 9" function would return the final 9 characters from the end of the string of text. IE "3302.html" then you could use your replace function as you did previously to remove the suffix.
** Edit - incorrect function suggested, forgot left from right
CodePudding user response:
try:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "\d ")*1))
CodePudding user response:
Google Sheets: