Home > database >  Applying a function depending of index and column of a dataframe to a dataframe
Applying a function depending of index and column of a dataframe to a dataframe

Time:11-29

I have a dataframe df whose index is [x[0], ..., x[N]] and column is [y[0], ..., y[M]] and whose data is a 2D array of z[i,j]'s.

I have a python function def f(x, y, z) of 3 float variables and I would like to calculate the 2d array of f(x[i], y[j], z[i,j])'s in the fastest way using numpy and/or pandas but I don't see how to do it.

I see the df.transform method but it doesn't seem to allow for lambdas that are dependent on index and column of df -- or at least I don't know how to provide such lambdas.

Details on df and f :

  • How was my df obtained ? I created it during a 45 minutes computation using an intensive numerical python vectorized function on a grid with N = 5000 and M = 5000 and I "to_csv'ed" it. Now when I want to use it, I use read_csv.

  • Now my function f is quite an involved numerical C function that I exposed to python with pybind11 (I put the tag for sake of completness) and that I don't want to rewrite in a "numpy vectorizable fashion" for now as it is ultra-optimized and very fast unitarily. Given x,y the function f solves numerically (iterative root finder) an equation with parameters x,y,z and unknow Z, the root of the equation being f(x,y,z).

CodePudding user response:

You could do a pd.melt:

df.reset_index().rename(columns={'index':'x'}).melt(var_name='y', value_name='z', id_vars='x')

It essentially transform the dataframe to the long format, making each row to have three entries: x, y and z.

CodePudding user response:

If you don't want to rewite the function, then using loop for to apply the function seems a easy way. you can do this

idx = df.index
cols = df.columns
vals = df.to_numpy()
r = [ 
  [f(x,y,z) for y, z in zip(cols, vals[i])]
   for i, x in enumerate(idx)
]
# if you want to recreate a dataframe
df_root = pd.DataFrame(data=r, index=idx, columns=cols)

there is a list comprehension on the index that includes a list comprehension on both the columns and the values of the row at the same time. vals[i] access the values from the row at position i. The result r is a list of length number of rows (N) and each item is a list of length number of columns (M). you don't need this structure especially but it is a easy way to build a dataframe with same index-columns as the original data.

Note that it will still be long, you have about 25 million operations to do, even if f is optimized.

  • Related