I am having a hard time coming up with a regex expression to find the id. For example, in my text I have multiple ID's but I want to extract the ID between Course and title as below:
"_class":"course","id":1565838, "title":"The Complete 2021 Web Development Bootcamp",
I want to extract the id number between "_class":"course"
and "title"
. What expression do I use? I also want to extract the title after the id.
CodePudding user response:
Why would you use a regex for this? It looks to me like you have a JSON payload. If you want to get the ID from this, it's actually much simpler than trying to use regular expressions:
import json
jsonStr = '{"_class":"course","id":1565838, "title":"The Complete 2021 Web Development Bootcamp"}'
data = json.loads(jsonStr)
print("ID: " data["id"]) # ID: 1565838
If you really want to use a regex for this then you can use \"id\"\:(?P<id>\d*)\,
to match the ID itself. Combining this with Python's regex library will give you:
import re
m = re.search('\"id\"\:(?P<id>\d*)\,', raw)
print("ID: " m.group(1)) # ID: 1565838
Alternatively, if you have multiple IDs you're looking for, you can modify this to remove the trailing comma and search via the findall
function:
ids = re.findall('\"id\"\:(?P<id>\d*)', raw)
print(ids) # ['1565838']
Finally, if you also want the title, you can modify it further to get both:
m = re.search('\"id\"\:(?P<id>\d*).*\"title\"\:\"(?P<title>[\w\s]*)', raw)
print(m["id"]) # '1565838'
print(m["title"]) # 'The Complete 2021 Web Development Bootcamp'