Storing outputs from input parameters into the rows of a panda data frame-CodePudding

I have a list containing values that I want to perform mathematical operations on with different input parameters that are also stored in a list. For each value in the list, I want to store the results from performing the operation - with each distinct input parameter - row-by-row, where each row corresponds to the subsequent input parameter in the list. Here is a minimal reproducible example of my code:

N = 2 #number of rows
index = []

for i in range(N):
    index.append(i)
       
df = pd.DataFrame(columns=['Events','Electrons','Photons'], index=index)

vals = [1,2,3]

param = [1,2]

for idx in df.index:

    df['Events'] = param[idx]   vals[0]

    df['Electrons'] = param[idx]   vals[1]

    df['Photons'] = param[idx]   vals[2]

All this code currently does for me is add param[1] to each element in vals and stores them into each of my two specified rows. What I want is to add param[0] to each element in vals, store and save it to the first row of my data frame, then step forward with param[1], store it in the second row, etc...

Eventually, I want to apply this to a much larger data set with more input parameters, but for now I would just like to know how I can best accomplish this task before scaling it up. Any guidance or advice is greatly appreciated!

CodePudding user response：

The intent is unclear, but does this do what you want?

param = [1, 2]
vals = [1, 2, 3]

df = pd.DataFrame(columns=["Events", "Electrons", "Photons"], index=range(len(param)))

for i, p in enumerate(param):
    df.iloc[i, :] = np.array(vals)   p

df:
  Events Electrons Photons
0      2         3       4
1      3         4       5

CodePudding user response：

By using vectorization you can achieve the task way more efficiently.

Pandas have a lot of vectorization capabilities, but sometimes, NumPy (a pandas dependancy btw) is more flexible. So let assume that we imported NumPy like this:

import numpy as np

Let us also assume that we have a function that receives a params list, a vals list and a list of column names cols:

First, we convert the list to arrays to have vectorization capabilities:

vals = np.array(vals)
params = np.array(params)

Then, we reshape vals and params to match their role. vals is to be applied on columns, so you can see it as a DF or operator with 1 row and 3 columns. params is to be applied on rows, so you can see it as a DF or operator with 2 rows and 1 columns.

# -1 means fit all the remaining size in here.
vals = vals.reshape(1, -1)
params = params.reshape(-1, 1)

Then, we do the vectorize operation and save it in a new var. Here, Numpy will distribute the data in a very efficient way. In the example above, the output shape should be the desired (2 rows, 3 columns).

df_numpy = vals   params

Finally, we create the DataFrame for our need. Note that the pandas will infer the index itself.

df = pd.DataFrame(df_numpy, columns=cols)

So if we gather all the bits in one function (with type annotation for clarity):

from typing import List

import numpy as np
import pandas as pd


def efficient_op(params: List[int], vals: List[int], cols: List[str]) -> pd.DataFrame:
    vals = np.array(vals)
    params = np.array(params)

    vals = vals.reshape(1, -1)
    params = params.reshape(-1, 1)

    df_numpy = vals   params

    df = pd.DataFrame(df_numpy, columns=cols)

    return df