I have a list containing values that I want to perform mathematical operations on with different input parameters that are also stored in a list. For each value in the list, I want to store the results from performing the operation - with each distinct input parameter - row-by-row, where each row corresponds to the subsequent input parameter in the list. Here is a minimal reproducible example of my code:
N = 2 #number of rows
index = []
for i in range(N):
index.append(i)
df = pd.DataFrame(columns=['Events','Electrons','Photons'], index=index)
vals = [1,2,3]
param = [1,2]
for idx in df.index:
df['Events'] = param[idx] vals[0]
df['Electrons'] = param[idx] vals[1]
df['Photons'] = param[idx] vals[2]
All this code currently does for me is add param[1]
to each element in vals
and stores them into each of my two specified rows. What I want is to add param[0]
to each element in vals
, store and save it to the first row of my data frame, then step forward with param[1]
, store it in the second row, etc...
Eventually, I want to apply this to a much larger data set with more input parameters, but for now I would just like to know how I can best accomplish this task before scaling it up. Any guidance or advice is greatly appreciated!
CodePudding user response:
The intent is unclear, but does this do what you want?
param = [1, 2]
vals = [1, 2, 3]
df = pd.DataFrame(columns=["Events", "Electrons", "Photons"], index=range(len(param)))
for i, p in enumerate(param):
df.iloc[i, :] = np.array(vals) p
df:
Events Electrons Photons
0 2 3 4
1 3 4 5
CodePudding user response:
By using vectorization you can achieve the task way more efficiently.
Pandas have a lot of vectorization capabilities, but sometimes, NumPy (a pandas dependancy btw) is more flexible. So let assume that we imported NumPy like this:
import numpy as np
Let us also assume that we have a function that receives a params
list, a vals
list and a list of column names cols
:
- First, we convert the list to arrays to have vectorization capabilities:
vals = np.array(vals)
params = np.array(params)
- Then, we reshape
vals
andparams
to match their role.vals
is to be applied on columns, so you can see it as a DF or operator with 1 row and 3 columns.params
is to be applied on rows, so you can see it as a DF or operator with 2 rows and 1 columns.
# -1 means fit all the remaining size in here.
vals = vals.reshape(1, -1)
params = params.reshape(-1, 1)
- Then, we do the vectorize operation and save it in a new var. Here, Numpy will distribute the data in a very efficient way. In the example above, the output shape should be the desired (2 rows, 3 columns).
df_numpy = vals params
- Finally, we create the DataFrame for our need. Note that the pandas will infer the index itself.
df = pd.DataFrame(df_numpy, columns=cols)
So if we gather all the bits in one function (with type annotation for clarity):
from typing import List
import numpy as np
import pandas as pd
def efficient_op(params: List[int], vals: List[int], cols: List[str]) -> pd.DataFrame:
vals = np.array(vals)
params = np.array(params)
vals = vals.reshape(1, -1)
params = params.reshape(-1, 1)
df_numpy = vals params
df = pd.DataFrame(df_numpy, columns=cols)
return df