How to retrieve information in the first section of the raw data only by regular expressions?-CodePudding

Below is a sample of the raw data which my code will process by regular expressions:

raw_data = '''
name        :   John
age         :   26
gender      :   male
occupation  :   teacher

Father
---------------------
name        :   Bill
age         :   52
gender      :   male

Mother
---------------------
name        :   Mary
age         :   48
gender      :   female
'''

I want to retrieve the following part of information from the raw data and store it in a dictionary:

dict(name = 'John', age = 26, gender = 'male', occupation = 'teacher')

However, when I run my code as follows, it does not work as I expect:

import re
p = re.compile('[^-]*?^([^:\-] ?):([^\r\n]*?)$', re.M)
rets = p.findall(raw_data)

infoAboutJohnAsDict = {}

if rets != []:
  for ret in rets:
    infoAboutJohnAsDict[ret[0]] = ret[1]
else:
  print("Not match.")

print(f'rets = {rets}')
print(f'infoAboutJohnAsDict = {infoAboutJohnAsDict}')

Can anyone give me any suggestion about how I should modify my code to achieve what I intend to do?

CodePudding user response：

Here is one approach using regular expressions. We can first trim off the latter portion of the input which you don't want using re.sub. Then, use re.findall to find all key value pairs for John, and convert to a dictionary.

raw_data = re.sub(r'\s \w \s - .*', '', raw_data, flags=re.S)
matches = re.findall(r'(\w )\s*:\s*(\w )', raw_data)
d = dict()
for m in matches:
    d[m[0]] = m[1]

print(d)
# {'gender': 'male', 'age': '26', 'name': 'John', 'occupation': 'teacher'}