I'd like to extract hostnames and datetime from a text file using Python. Below is the text and I need to extract the date behind 'notAfter=' and the hostname behind 'UnitId:' into a dictionary where the datetime is attached to the hostname.
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/1
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/0
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/2
CodePudding user response:
A pretty simple regex will do it notAfter=(.*)\n\s UnitId: (.*)
import re
content = """- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/1
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/0
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/2"""
results = [{'datetime': dt, 'hostname': host}
for dt, host in re.findall(r"notAfter=(.*)\n\s UnitId: (.*)", content)]
print(results)
# [{'datetime': 'Jun 2 10:15:03 2031 GMT', 'hostname': 'octavia/1'},
# {'datetime': 'Jun 2 10:15:03 2031 GMT', 'hostname': 'octavia/0'},
# {'datetime': 'Jun 2 10:15:03 2031 GMT', 'hostname': 'octavia/2'}]
CodePudding user response:
One of the approaches:
text = """- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/1
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/0
- Stdout: |
notAfter=Jun 2 10:15:03 2031 GMT
UnitId: octavia/2"""
import re
output = [{'datetime': data[0], 'hostname': data[1]} for data in re.findall(r'.*notAfter=(.*?)\n.*UnitId:\s*(.*?)\n', text)]
print (output)
Output:
[{'datetime': 'Jun 2 10:15:03 2031 GMT', 'hostname': 'octavia/1'}, {'datetime': 'Jun 2 10:15:03 2031 GMT', 'hostname': 'octavia/0'}]