Tried:
# model_pattern = r'\d{4}\-([^/] )\-'
model_pattern = r'[-]([^/] )\-'
WANT MODEL:
2021-Mercedes-Benz-Sprinter 2500
AND VIN:
286f67180a0e09a8729929613aac3877
FROM:
/used/Mercedes-Benz/2021-Mercedes-Benz-Sprinter 2500-286f67180a0e09a8729929613aac3877.htm
Another one, this one has no " " in it:
/used/Audi/2015-Audi-SQ5-286f67180a0e09a8729929613aac3877.htm
I use
Clean_Make["Model"] = Clean_Make["Page"].str.extract(model_pattern)
Clean_Make
This is the resulting table:
Page City Pageviews Unique Pageviews Avg. Time on Page Entrances Bounce Rate % Exit Make1 Make2 Make Model
71 /used/Mercedes-Benz/2021-Mercedes-Benz-Sprinte... San Jose 310 149 00:00:27 149 2.00% 47.74% Mercedes-Benz Mercedes-Benz Mercedes-Benz Mercedes-Benz-Sprinter 2500
103 /used/Audi/2015-Audi-SQ5-286f67180a0e09a872992... Menlo Park 250 87 00:02:36 82 0.00% 32.40% Audi Audi Audi Audi-SQ5
158 /used/Mercedes-Benz/2021-Mercedes-Benz-Sprinte... San Francisco 202 98 00:00:18 98 2.04% 48.02% Mercedes-Benz Mercedes-Benz Mercedes-Benz Mercedes-Benz-Sprinter 2500
165 /used/Audi/2020-Audi-S8-c6df09610a0e09af26b5cf... San Francisco 194 93 00:00:42 44 2.22% 29.38% Audi Audi Audi Audi-S8
168 /used/Mercedes-Benz/2021-Mercedes-Benz-Sprinte... (not set) 192 91 00:00:11 91 2.20% 47.40% Mercedes-Benz Mercedes-Benz Mercedes-Benz Mercedes-Benz-Sprinter 2500
... ... ... ... ... ... ... ... ... ... ... ... ...
4995 /used/Subaru/2019-Subaru-Crosstrek-5717b3040a0... Union City 10 3 00:02:02 0 0.00% 30.00% Subaru Subaru Subaru Subaru-Crosstrek
4996 /used/Tesla/2017-Tesla-Model S-15605a190a0e087... San Jose 10 5 00:01:29 5 0.00% 50.00% Tesla Tesla Tesla Tesla-Model S
CodePudding user response:
You can use
/([^/] )-([a-f0-9]{32})\.htm
See the regex demo.
Details:
/
- a/
char([^/] )
- Group 1 (model): one or more chars other than/
-
- a hyphen([a-f0-9]{32})
- Group 2 (VIN): 32 hex chars\.htm
- a.htm
string.
In Pandas, you can use
Clean_Make[["Model", "VIN"]] = Clean_Make["Page"].str.extract(r'/([^/] )-([a-f0-9]{32})\.htm', expand=False)