Home > Back-end >  How to retrieve pandas dataframe from its string representation?
How to retrieve pandas dataframe from its string representation?

Time:12-22

The pandas DataFrame object has a to_string() method that is called on the __repr__ magic method. Thus when I say x = f'{df}', x is gonna be the string representation of the dataframe df.

How can I retrieve (reconstruct) the dataframe only having x? So I would like a method called get_dataframe_from_string(df: str) -> pd.DataFrame that gets the string and returns the dataframe.

The method should be generic, so it should work with multiindices as well.

CodePudding user response:

TL;DR

Use df.to_csv() instead of df.__str__() and then you can do it.

str(df) won't work

The short answer is: you can't. At least not with pandas' builtin string representation.

The reason is df.__repr__ does not have a (mathematical) inverse function:

import pandas as pd


df = pd.DataFrame.from_dict(dict(x=range(100), y=range(100)))
print(df)
#      x   y
# 0    0   0
# 1    1   1
# 2    2   2
# 3    3   3
# 4    4   4
# ..  ..  ..
# 95  95  95
# 96  96  96
# 97  97  97
# 98  98  98
# 99  99  99

There is no way to know what the rows 5-94 contain.

A solution: df.to_csv

One could come up with hacks to work around it but the only sensible way to do this Imo is to use well-known pandas methods, e.g. to_csv:

str_df = df.to_csv()
print(str_df)
# ,x,y
# 0,0,0
# 1,1,1
# 2,2,2
# 3,3,3

where str_df contains all the data (I truncated the output).

Then you can get your original dataframe back using io and read_csv:

import io

original_df = pd.read_csv(io.StringIO(str_df))
print(original_df)
#     Unnamed: 0   x   y
# 0            0   0   0
# 1            1   1   1
# 2            2   2   2
# 3            3   3   3
# 4            4   4   4
# ..         ...  ..  ..
# 95          95  95  95
# 96          96  96  96
# 97          97  97  97
# 98          98  98  98
# 99          99  99  99

Note that the column Unnamed is the present because we didn't exclude to row names. These can be excluded in df.to_csv.

CodePudding user response:

pandas basically does this in its read_clipboard function. It's trying to construct a DataFrame from a string text, so you should be able to adopt whatever happens after this line.

  • Related