I'd creating a Pandas DataFrame in which each particular (index, column)
location can be a numpy ndarray of arbitrary shape, or even a simple number.
This works:
import numpy as np, pandas as pd
x = pd.DataFrame([[np.random.rand(100, 100, 20, 2), 3], [2, 2], [3, 3], [4, 4]],
index=['A1', 'B2', 'C3', 'D4'], columns=['data', 'data2'])
print(x)
but takes 50 seconds to create on my computer! Why?
np.random.rand(100, 100, 20, 2)
alone is super fast (< 1 second to create)
How to speed up the creation of Pandas datasets containing ndarrays of various shapes?
CodePudding user response:
It's not actually the creation that is the issue, it's the print
statement. 1000 loops of the creation take 2.8 seconds on my computer. But one iteration of the print
takes about 26 seconds.
Interestingly, print(x['data2'])
, print(x['data']['A1'])
and print(x['data']['B2'])
are all basically instantaneous. So it seems print
is having an issue figuring out how to display items of vastly different size. Perhaps a bug?