The pandas DataFrame object has a to_string() method that is called on the __repr__
magic method. Thus when I say x = f'{df}'
, x
is gonna be the string representation of the dataframe df.
How can I retrieve (reconstruct) the dataframe only having x
? So I would like a method called get_dataframe_from_string(df: str) -> pd.DataFrame
that gets the string and returns the dataframe.
The method should be generic, so it should work with multiindices as well.
CodePudding user response:
TL;DR
Use df.to_csv()
instead of df.__str__()
and then you can do it.
str(df) won't work
The short answer is: you can't. At least not with pandas' builtin string representation.
The reason is df.__repr__
does not have a (mathematical) inverse function:
import pandas as pd
df = pd.DataFrame.from_dict(dict(x=range(100), y=range(100)))
print(df)
# x y
# 0 0 0
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# .. .. ..
# 95 95 95
# 96 96 96
# 97 97 97
# 98 98 98
# 99 99 99
There is no way to know what the rows 5-94 contain.
A solution: df.to_csv
One could come up with hacks to work around it but the only sensible way to do this Imo is to use well-known pandas methods, e.g. to_csv
:
str_df = df.to_csv()
print(str_df)
# ,x,y
# 0,0,0
# 1,1,1
# 2,2,2
# 3,3,3
where str_df
contains all the data (I truncated the output).
Then you can get your original dataframe back using io
and read_csv
:
import io
original_df = pd.read_csv(io.StringIO(str_df))
print(original_df)
# Unnamed: 0 x y
# 0 0 0 0
# 1 1 1 1
# 2 2 2 2
# 3 3 3 3
# 4 4 4 4
# .. ... .. ..
# 95 95 95 95
# 96 96 96 96
# 97 97 97 97
# 98 98 98 98
# 99 99 99 99
Note that the column Unnamed
is the present because we didn't exclude to row names. These can be excluded in df.to_csv
.
CodePudding user response:
pandas
basically does this in its read_clipboard
function. It's trying to construct a DataFrame from a string text
, so you should be able to adopt whatever happens after this line.