Home > OS >  Applying a function that inverts column values using pandas
Applying a function that inverts column values using pandas

Time:11-16

I'm hoping to get someone's advice on a problem I'm running into trying to apply a function over columns in a dataframe I have that inverses the values in the columns.

For example, if the observation is 0 and the max of the column is 7, I subtract the absolute value of the max from the observation: abs(0 - 7) = 7, so the smallest value becomes the largest.

All of the columns essentially have a similar range to the above example. The shape of the sliced df is 16984,512

The code I have written creates a bunch of empty columns, that are then replaced with the max values of those columns. The new shape becomes 16984, 1029 including the 5 columns that I sliced off before. Then I use lambda to apply the function over the columns in question:

#create max cols
col = df.iloc[:, 5:]
col_names = col.columns
maximum = '_max'

for col in df[col_names]:
    max_value = df[col].max()
    df[col maximum] = np.zeros((16984,))
    df[col maximum].replace(to_replace = 0, value = max_value)

#for each row and column inverse value of row

def invert_col(x, col):
    """Invert values of a column"""
    return abs(x[col] - x[col "_max"])

for col in col_names:
    new_df = df.apply(lambda x: invert_col(x, col), axis = 1)

I've tried this where I includes axis = 1 and when I remove it and the behaviour is quite different. I am fairly new to Python so I'm finding it difficult to troubleshoot why this is happening.

When I remove axis = 1, the error I get is a key error: KeyError: 'TV_TIME_LIVE' TV_TIME_LIVE is the first column in col_names, so it's as if it's not finding it.

When I include axis = 1, I don't get an error, but all the columns in the df get flattened into a Series, with length equal to the original df.

What I'm expecting is a new_df with the same shape (16984,1029) where the values of the 5th to the 517th column have the inverse function applied to them.

I would really appreciate any guidance as to what's going on here and how we might get to the desired output.

Many thanks

CodePudding user response:

apply is slow. It is better to use vectorized approaches as below. axis=1 means that your function will work column wise, if you do not specify it will work row wise. When you get key error it means pandas is searching for a column name and it cannot find it. If you really must use apply try searching for a few examples how exactly it works.

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randint(0,7,size=(100, 4)), columns=list('ABCD'))
col_list=df.columns.copy()
for col in col_list:
    df[col "inversed"]=abs(df[col]-df[col].max())
  • Related