I have a local git repo and i'm trying to find a way to get specific version of my xlsx file into my python code so i can process it using pandas
I found gitpython lib, but i'm not sure how to use it correctly
repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'
but i dont know what to do next i tried to load it to pandas using path, but, of course, it just loads my last version
how to load previous version of xlsx to pandas?
CodePudding user response:
When you ask for commit.tree / 'dataset.xlsx'
, you get back a git.Blob
object:
>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
If you want to read the contents of the object, you can extract the content using data_stream
method, which returns a file-like object:
>>> data = targetfile.data_stream.read()
Or you can use the stream_data
method (don't look at me, I didn't name them), which writes data into a file-like object:
>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'