Home > Software engineering >  How to apply a function to specific columns of a pandas dataframe?
How to apply a function to specific columns of a pandas dataframe?

Time:02-13

I would like to apply a function to specific columns of a pandas data frame. Here is an illustration:

# import modules
from pandas_datareader import data as pdr

# import parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]

# get the data
data = pdr.get_data_yahoo(symbols, start, end)

def mult(row):
    return row['Close']*2, row['Open']/3


data[['Close', 'Open']].apply(mult, axis = 1)

print(data.head())

The result:

Attributes  Adj Close      Close       High        Low       Open       Volume
Symbols          AAPL       AAPL       AAPL       AAPL       AAPL         AAPL
Date                                                                          
2020-01-02  73.894333  75.087502  75.150002  73.797501  74.059998  135480400.0
2020-01-03  73.175926  74.357498  75.144997  74.125000  74.287498  146322800.0
2020-01-06  73.759003  74.949997  74.989998  73.187500  73.447502  118387200.0
2020-01-07  73.412109  74.597504  75.224998  74.370003  74.959999  108872000.0
2020-01-08  74.593048  75.797501  76.110001  74.290001  74.290001  132079200.0

Any thoughts as to why that doesn't work?

CodePudding user response:

Two things:

(i) You never assign it back to the original DataFrame, so it never gets updated.

(ii) If your function is not anymore complex, for simple multiplication, vectorized operation is better, so instead of the function, do the multiplication directly on the column:

data['Close'] *= 2
data['Open'] /= 3

CodePudding user response:

I think the problem is that you are not assigning the return of the mult functions to any variable.

One way to achieve what you want is:

# import modules
from pandas_datareader import data as pdr

# import parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]

# get the data
data = pdr.get_data_yahoo(symbols, start, end)

def mult(df):
    df['Close'] = 2 * df['Close']
    df['Open'] = df['Open'] / 3
    return df

mult(data)

print(data.head())

Attributes  Adj Close       Close       High        Low       Open  \
Symbols          AAPL        AAPL       AAPL       AAPL       AAPL   
Date                                                                 
2020-01-02  73.894325  150.175003  75.150002  73.797501  24.686666   
2020-01-03  73.175926  148.714996  75.144997  74.125000  24.762499   
2020-01-06  73.759010  149.899994  74.989998  73.187500  24.482501   
2020-01-07  73.412117  149.195007  75.224998  74.370003  24.986666   
2020-01-08  74.593048  151.595001  76.110001  74.290001  24.763334
  • Related