Home > Software design >  Extract hostname and datetime from text file in Python
Extract hostname and datetime from text file in Python

Time:11-11

I'd like to extract hostnames and datetime from a text file using Python. Below is the text and I need to extract the date behind 'notAfter=' and the hostname behind 'UnitId:' into a dictionary where the datetime is attached to the hostname.

- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/1
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/0
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/2

CodePudding user response:

A pretty simple regex will do it notAfter=(.*)\n\s UnitId: (.*)

import re

content = """- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/1
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/0
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/2"""

results = [{'datetime': dt, 'hostname': host}
           for dt, host in re.findall(r"notAfter=(.*)\n\s UnitId: (.*)", content)]
print(results)

# [{'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/1'}, 
#  {'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/0'}, 
#  {'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/2'}]

CodePudding user response:

One of the approaches:

text = """- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/1
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/0
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/2"""
  
import re
output = [{'datetime': data[0], 'hostname': data[1]} for data in re.findall(r'.*notAfter=(.*?)\n.*UnitId:\s*(.*?)\n', text)]
print (output)

Output:

[{'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/1'}, {'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/0'}]
  • Related