Home > Software engineering >  Storing outputs from input parameters into the rows of a panda data frame
Storing outputs from input parameters into the rows of a panda data frame

Time:09-08

I have a list containing values that I want to perform mathematical operations on with different input parameters that are also stored in a list. For each value in the list, I want to store the results from performing the operation - with each distinct input parameter - row-by-row, where each row corresponds to the subsequent input parameter in the list. Here is a minimal reproducible example of my code:

N = 2 #number of rows
index = []

for i in range(N):
    index.append(i)
       
df = pd.DataFrame(columns=['Events','Electrons','Photons'], index=index)

vals = [1,2,3]

param = [1,2]

for idx in df.index:

    df['Events'] = param[idx]   vals[0]

    df['Electrons'] = param[idx]   vals[1]

    df['Photons'] = param[idx]   vals[2]

All this code currently does for me is add param[1] to each element in vals and stores them into each of my two specified rows. What I want is to add param[0] to each element in vals, store and save it to the first row of my data frame, then step forward with param[1], store it in the second row, etc...

Eventually, I want to apply this to a much larger data set with more input parameters, but for now I would just like to know how I can best accomplish this task before scaling it up. Any guidance or advice is greatly appreciated!

CodePudding user response:

The intent is unclear, but does this do what you want?

param = [1, 2]
vals = [1, 2, 3]

df = pd.DataFrame(columns=["Events", "Electrons", "Photons"], index=range(len(param)))

for i, p in enumerate(param):
    df.iloc[i, :] = np.array(vals)   p

df:
  Events Electrons Photons
0      2         3       4
1      3         4       5

CodePudding user response:

By using vectorization you can achieve the task way more efficiently.

Pandas have a lot of vectorization capabilities, but sometimes, NumPy (a pandas dependancy btw) is more flexible. So let assume that we imported NumPy like this:

import numpy as np

Let us also assume that we have a function that receives a params list, a vals list and a list of column names cols:

  1. First, we convert the list to arrays to have vectorization capabilities:
vals = np.array(vals)
params = np.array(params)
  1. Then, we reshape vals and params to match their role. vals is to be applied on columns, so you can see it as a DF or operator with 1 row and 3 columns. params is to be applied on rows, so you can see it as a DF or operator with 2 rows and 1 columns.
# -1 means fit all the remaining size in here.
vals = vals.reshape(1, -1)
params = params.reshape(-1, 1)
  1. Then, we do the vectorize operation and save it in a new var. Here, Numpy will distribute the data in a very efficient way. In the example above, the output shape should be the desired (2 rows, 3 columns).
df_numpy = vals   params
  1. Finally, we create the DataFrame for our need. Note that the pandas will infer the index itself.
df = pd.DataFrame(df_numpy, columns=cols)

So if we gather all the bits in one function (with type annotation for clarity):

from typing import List

import numpy as np
import pandas as pd


def efficient_op(params: List[int], vals: List[int], cols: List[str]) -> pd.DataFrame:
    vals = np.array(vals)
    params = np.array(params)

    vals = vals.reshape(1, -1)
    params = params.reshape(-1, 1)

    df_numpy = vals   params

    df = pd.DataFrame(df_numpy, columns=cols)

    return df
  • Related