Given git log output like such:
commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)
Author: Slim Shady
Date: Sun Sep 18 19:53:42 2022 -0700
ci: remove debugging line github action script
commit body
commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)
Author: Slim Shady
Date: Sun Sep 18 19:41:20 2022 -0700
feat: read and write IDs
commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874
Author: Slim Shady
Date: Sun Sep 18 17:41:03 2022 -0700
feat: new hook to allow custom tags
I'd like that to turn into a list in python, with each element containing a single commit (including hash, author, body, etc.).
I've tried using re.split(r"commit \w{40}", git_log)
, but it doesn't keep the hash in the output.
CodePudding user response:
You could also use a positive lookahead to split your data.
with open('git_log.txt', 'r') as f:
data = f.read()
res = list(filter(None, re.split(r"(?=commit \w{40})", data)))
Output:
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
CodePudding user response:
You need to put the split pattern in a capture group to allow it to be part of the output:
# filter(None, ...) to remove empty strings
>>> res = filter(None, re.split(r'(commit \w{40})', inp))
# Join items in group of two to handle the split between a commit line and rest of its body
>>> output = ["".join(item) for item in zip(*[res] * 2)]
>>> output
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
But if you do have control over the git log
output, you could format it differently and parse it without regex:
git log --pretty=format:'"%H"%x09"%an"%x09"