Home > Mobile >  How to get specific file version from git repository using python
How to get specific file version from git repository using python

Time:04-14

I have a local git repo and i'm trying to find a way to get specific version of my xlsx file into my python code so i can process it using pandas

I found gitpython lib, but i'm not sure how to use it correctly

repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'

but i dont know what to do next i tried to load it to pandas using path, but, of course, it just loads my last version

how to load previous version of xlsx to pandas?

CodePudding user response:

When you ask for commit.tree / 'dataset.xlsx', you get back a git.Blob object:

>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">

If you want to read the contents of the object, you can extract the content using data_stream method, which returns a file-like object:

>>> data = targetfile.data_stream.read()

Or you can use the stream_data method (don't look at me, I didn't name them), which writes data into a file-like object:

>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'
  • Related