I have a dataframe df
whose index is [x[0], ..., x[N]]
and column is [y[0], ..., y[M]]
and whose data is a 2D array of z[i,j]
's.
I have a python function def f(x, y, z)
of 3 float variables and I would like to calculate the 2d array of f(x[i], y[j], z[i,j])
's in the fastest way using numpy and/or pandas but I don't see how to do it.
I see the df.transform
method but it doesn't seem to allow for lambdas that are dependent on index and column of df
-- or at least I don't know how to provide such lambdas.
Details on df
and f
:
How was my
df
obtained ? I created it during a 45 minutes computation using an intensive numerical python vectorized function on a grid with N = 5000 and M = 5000 and I "to_csv
'ed" it. Now when I want to use it, I useread_csv
.Now my function
f
is quite an involved numericalC
function that I exposed to python with pybind11 (I put the tag for sake of completness) and that I don't want to rewrite in a "numpy vectorizable fashion" for now as it is ultra-optimized and very fast unitarily. Givenx,y
the functionf
solves numerically (iterative root finder) an equation with parametersx,y,z
and unknowZ
, the root of the equation beingf(x,y,z)
.
CodePudding user response:
You could do a pd.melt:
df.reset_index().rename(columns={'index':'x'}).melt(var_name='y', value_name='z', id_vars='x')
It essentially transform the dataframe to the long format, making each row to have three entries: x, y and z.
CodePudding user response:
If you don't want to rewite the function, then using loop for
to apply the function seems a easy way. you can do this
idx = df.index
cols = df.columns
vals = df.to_numpy()
r = [
[f(x,y,z) for y, z in zip(cols, vals[i])]
for i, x in enumerate(idx)
]
# if you want to recreate a dataframe
df_root = pd.DataFrame(data=r, index=idx, columns=cols)
there is a list comprehension on the index that includes a list comprehension on both the columns and the values of the row at the same time. vals[i]
access the values from the row at position i
. The result r
is a list of length number of rows (N) and each item is a list of length number of columns (M). you don't need this structure especially but it is a easy way to build a dataframe with same index-columns as the original data.
Note that it will still be long, you have about 25 million operations to do, even if f is optimized.