Home > other >  Write a function to perform calculations on multiple columns in a Pandas dataframe
Write a function to perform calculations on multiple columns in a Pandas dataframe

Time:09-29

I have the following dataframe (the real one has a lot more columns and rows, so just using this as an example):

 {'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 1.2, 1: 4.6, 2: 35.4, 3: 0.005},
 'smell': {0: 20.0, 1: 23.0, 2: 2.1, 3: 5.3},
 'shape': {0: 0.004, 1: 0.2, 2: 0.12, 3: 11.0},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}}

I'd like to write a function to perform calculations on the dataframe, for specific columns. The calculation is in the code below. As I'd only want to apply the code to specific columns, I've set up a list of columns, and as there is a pre-defined 'factor' we need to take into account in the calculation, I set this up too:

cols = ['taste', 'smell', 'shape']
factor = 72

def multiply_columns(row):
    return ((row[cols] / row['volume']) * (factor * row['volume'] / row['weight']) / 1000)

Then, I apply the function to the dataframe, and I want to overwrite the original column values with the new ones, so I do this:

for cols in df.columns:
    df[cols] = df[cols].apply(multiply_columns)

But I get the following error:

~\AppData\Local\Temp/ipykernel_8544/3939806184.py in multiply_columns(row)
      3 
      4 def multiply_columns(row):
----> 5     return ((row[cols] / row['volume']) * (factor * row['volume'] / row['weight']) / 1000)
      6 
      7 

TypeError: string indices must be integers

But the values I'm using in the calculation aren't strings:

sample        object
sample id      int64
replicate      int64
taste        float64
smell        float64
shape        float64
volume         int64
weight       float64
dtype: object

The desired output would be:

{'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 0.0074, 1: 0.028366667, 2: 0.2183, 3: 3.08333e-05},
 'smell': {0: 0.123333333, 1: 0.141833333, 2: 0.01295, 3: 0.032683333},
 'shape': {0: 2.46667e-05, 1: 0.001233333, 2: 0.00074, 3: 0.067833333},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}}

Can anyone kindly show me the errors of my ways

CodePudding user response:

This has a few issues.

If you wanted to index elements in row, the index you're using is a string (the column name) rather than an integer (like an index). To get an index for the column names you're interested in, you could use this:

cols = ['taste', 'smell', 'shape']
cols_idx = [df.columns.get_loc(col) for col in cols]

However, if I understand your question, you could perform this operation on columns directly with the understanding that the operation will be performed on each row. See a test case that worked for me:

import pandas as pd

df = pd.DataFrame({'sample': {0: 'orange', 1: 'orange', 2: 'banana', 3: 'banana'},
 'sample id': {0: 1, 1: 1, 2: 5, 3: 5},
 'replicate': {0: 1, 1: 2, 2: 1, 3: 2},
 'taste': {0: 1.2, 1: 4.6, 2: 35.4, 3: 0.005},
 'smell': {0: 20.0, 1: 23.0, 2: 2.1, 3: 5.3},
 'shape': {0: 0.004, 1: 0.2, 2: 0.12, 3: 11.0},
 'volume': {0: 23, 1: 23, 2: 23, 3: 23},
 'weight': {0: 12.0, 1: 1.3, 2: 2.4, 3: 3.2}})

cols = ['taste', 'smell', 'shape']

factor = 72

for col in cols:
    df[col] = ((df[col] / df['volume']) * (factor * df['volume'] / df['weight']) / 1000)

Note that your line

for cols in df.columns:

indicated you should run this operation on every column (cols became the index and was no longer your list).

CodePudding user response:

You have to pass the column as well to the function.

cols = ['taste', 'smell', 'shape']
factor = 72
def multiply_columns(row,col):
    return ((row[col]/ row['volume']) * (factor * row['volume'] / row['weight']) / 1000)

for col in cols:
    df[col] = df.apply(lambda x:multiply_columns(x,col),axis=1)

Also the output I'm getting is bit different from your desired output even though I used the same formula.

 sample sample id   replicate   taste   smell   shape   volume  weight
0   orange  1   1   0.00720000000   0.12000000000   0.00002400000   23  12.00000000000
1   orange  1   2   0.25476923077   1.27384615385   0.01107692308   23  1.30000000000
2   banana  5   1   1.06200000000   0.06300000000   0.00360000000   23  2.40000000000
3   banana  5   2   0.00011250000   0.11925000000   0.24750000000   23  3.20000000000 
  • Related