Consider the following piece of code:
import pandas as pd
import numpy as np
N = 10
idx = np.linspace(0, 1, N)
labels = ["a", "b", "c"]
values = np.stack((np.random.rand(N), np.random.rand(N), np.random.rand(N))).transpose()
df = pd.DataFrame(index=idx, columns=labels, data=values)
Assume now that I want to subtract the value 3
from column "a"
and the value 5
from column "b"
.
I can easily achieve that through the following:
cols = labels[0:2]
df.loc[:, cols] = df.loc[:, cols].subtract([3, 5])
However, if I use the following:
df.loc[:, cols].apply(lambda x: x.subtract([3, 5]))
I get a ValueError: Lengths must be equal
.
If I use
df.loc[:, cols].apply(lambda x: x.subtract(3))
It works but it subtracts 3
from both the columns specified in cols
.
Given the generality of the method apply()
, I would like to understand how to use it when the type of x
in the lambda function
used in the apply()
method is a dataframe
and I want to do different things in different columns.
I am searching for the less verbose way, I am aware that I can do it through a for loop by iterating on cols
.
CodePudding user response:
Use axis=1
for processing per rows, but it is slowier like vectorized first solution:
df.loc[:, cols] = df.loc[:, cols].apply(lambda x: x.subtract([3, 5]), axis=1)
You can check how it working with print
:
#each Series has 2 values by rows
df.loc[:, cols].apply(lambda x: print (x), axis=1)
#each Series has all values of columns
df.loc[:, cols].apply(lambda x: print (x))