Home > Mobile >  How to split up git log output into a list of commits in python?
How to split up git log output into a list of commits in python?

Time:09-19

Given git log output like such:

commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)
Author: Slim Shady
Date:   Sun Sep 18 19:53:42 2022 -0700

    ci: remove debugging line github action script

    commit body

commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)
Author: Slim Shady
Date:   Sun Sep 18 19:41:20 2022 -0700

    feat: read and write IDs

commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874
Author: Slim Shady
Date:   Sun Sep 18 17:41:03 2022 -0700

    feat: new hook to allow custom tags

I'd like that to turn into a list in python, with each element containing a single commit (including hash, author, body, etc.).

I've tried using re.split(r"commit \w{40}", git_log), but it doesn't keep the hash in the output.

CodePudding user response:

You could also use a positive lookahead to split your data.

with open('git_log.txt', 'r') as f:
    data = f.read()
res = list(filter(None, re.split(r"(?=commit \w{40})", data)))

Output:

[
    'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate:   Sun Sep 18 19:53:42 2022 -0700\n\n    ci: remove debugging line github action script\n\n    commit body\n\n',
    'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate:   Sun Sep 18 19:41:20 2022 -0700\n\n    feat: read and write IDs\n\n',
    'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate:   Sun Sep 18 17:41:03 2022 -0700\n\n    feat: new hook to allow custom tags'
]

CodePudding user response:

You need to put the split pattern in a capture group to allow it to be part of the output:

# filter(None, ...) to remove empty strings  
>>> res = filter(None, re.split(r'(commit \w{40})', inp))
# Join items in group of two to handle the split between a commit line and rest of its body
>>> output = ["".join(item) for item in zip(*[res] * 2)]
>>> output
[
    'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate:   Sun Sep 18 19:53:42 2022 -0700\n\n    ci: remove debugging line github action script\n\n    commit body\n\n',
    'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate:   Sun Sep 18 19:41:20 2022 -0700\n\n    feat: read and write IDs\n\n',
    'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate:   Sun Sep 18 17:41:03 2022 -0700\n\n    feat: new hook to allow custom tags'
]

But if you do have control over the git log output, you could format it differently and parse it without regex:

git log --pretty=format:'"%H"%x09"%an"%x09"           
  • Related