Applying a function that inverts column values using pandas-CodePudding

I'm hoping to get someone's advice on a problem I'm running into trying to apply a function over columns in a dataframe I have that inverses the values in the columns.

For example, if the observation is 0 and the max of the column is 7, I subtract the absolute value of the max from the observation: abs(0 - 7) = 7, so the smallest value becomes the largest.

All of the columns essentially have a similar range to the above example. The shape of the sliced df is 16984,512

The code I have written creates a bunch of empty columns, that are then replaced with the max values of those columns. The new shape becomes 16984, 1029 including the 5 columns that I sliced off before. Then I use lambda to apply the function over the columns in question:

#create max cols
col = df.iloc[:, 5:]
col_names = col.columns
maximum = '_max'

for col in df[col_names]:
    max_value = df[col].max()
    df[col maximum] = np.zeros((16984,))
    df[col maximum].replace(to_replace = 0, value = max_value)

#for each row and column inverse value of row

def invert_col(x, col):
    """Invert values of a column"""
    return abs(x[col] - x[col "_max"])

for col in col_names:
    new_df = df.apply(lambda x: invert_col(x, col), axis = 1)

I've tried this where I includes axis = 1 and when I remove it and the behaviour is quite different. I am fairly new to Python so I'm finding it difficult to troubleshoot why this is happening.

When I remove axis = 1, the error I get is a key error: KeyError: 'TV_TIME_LIVE' TV_TIME_LIVE is the first column in col_names, so it's as if it's not finding it.

When I include axis = 1, I don't get an error, but all the columns in the df get flattened into a Series, with length equal to the original df.

What I'm expecting is a new_df with the same shape (16984,1029) where the values of the 5th to the 517th column have the inverse function applied to them.

I would really appreciate any guidance as to what's going on here and how we might get to the desired output.

Many thanks

CodePudding user response：

apply is slow. It is better to use vectorized approaches as below. axis=1 means that your function will work column wise, if you do not specify it will work row wise. When you get key error it means pandas is searching for a column name and it cannot find it. If you really must use apply try searching for a few examples how exactly it works.

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randint(0,7,size=(100, 4)), columns=list('ABCD'))
col_list=df.columns.copy()
for col in col_list:
    df[col "inversed"]=abs(df[col]-df[col].max())