Home > Blockchain >  Regex findall pattern
Regex findall pattern

Time:04-19

I am having a hard time coming up with a regex expression to find the id. For example, in my text I have multiple ID's but I want to extract the ID between Course and title as below:

"_class":"course","id":1565838, "title":"The Complete 2021 Web Development Bootcamp",

I want to extract the id number between "_class":"course" and "title". What expression do I use? I also want to extract the title after the id.

CodePudding user response:

Why would you use a regex for this? It looks to me like you have a JSON payload. If you want to get the ID from this, it's actually much simpler than trying to use regular expressions:

import json

jsonStr = '{"_class":"course","id":1565838, "title":"The Complete 2021 Web Development Bootcamp"}'

data = json.loads(jsonStr)
print("ID: "   data["id"]) # ID: 1565838

If you really want to use a regex for this then you can use \"id\"\:(?P<id>\d*)\, to match the ID itself. Combining this with Python's regex library will give you:

import re
m = re.search('\"id\"\:(?P<id>\d*)\,', raw)
print("ID: "   m.group(1)) # ID: 1565838

Alternatively, if you have multiple IDs you're looking for, you can modify this to remove the trailing comma and search via the findall function:

ids = re.findall('\"id\"\:(?P<id>\d*)', raw)
print(ids) # ['1565838']

Finally, if you also want the title, you can modify it further to get both:

m = re.search('\"id\"\:(?P<id>\d*).*\"title\"\:\"(?P<title>[\w\s]*)', raw)
print(m["id"])    # '1565838'
print(m["title"]) # 'The Complete 2021 Web Development Bootcamp'
  • Related