Let's say I have a simple dataframe like the below:
df_lst = [["A", "B", "C", "D", "E"],
["D", "A", "B", "K", "J"],
["B", "D", "A", "A", "A"],
["C", "A", "B", "K", "J"]]
df = pd.DataFrame(df_lst,
columns = ["1", "2", "3", "4", "5"])
I want to update multiple columns at the same time based on the conditions:
#valid first col:
valid_1 = ((df["1"] == "A") | (df["1"] == "C"))
#valid second col:
valid_2 = ((df["2"] == "A") | (df["2"] == "B"))
Using these conditions, I want to update the rest of the columns:
def df_iter_static(df):
df.loc[valid_1 & valid_2, ["3", "4", "5"]] = ["X", "Y", "Z"]
print(df)
With these static value assignments (X, Y, Z) everything works fine, I get the expected results. The first and last records are getting updated. But what if I want to make something like this?
def df_iter_multi(df):
df.loc[valid_1 & valid_2, ["3", "4", "5"]] = [df["1"], df["2"], "X"]
print(df)
In the above function, I want to update the 3rd and 4th columns record value with the 1st and 2nd columns record value, and for the 5th column I want to add a statix "X" value. This way this is obviously not going to work, because df["1"] and df["2"] means an entire column. So this means I would need to use df["1"].values[indexth record] but I have no idea how.
CodePudding user response:
You need to craft a 2D array, you can use something like:
def df_iter_multi(df):
m = valid_1 & valid_2
df.loc[m, ["3", "4", "5"]] = df[['1', '2']].assign(X='X').loc[m].values
print(df)
df_iter_multi(df)
NB. I'm keeping your code as it is, but you should probably calculate valid_1/valid_2 in the function and not print from the function.
output:
1 2 3 4 5
0 A B A B X
1 D A B K J
2 B D A A A
3 C A C A X
alternative:
def df_iter_multi(df):
m = valid_1 & valid_2
df.loc[m, ["1", "2", "3", "4", "5"]] = np.hstack([df[['1', '2']].loc[m].values,
np.tile(['X', 'Y', 'Z'], (m.sum(), 1))
])
print(df)
output:
1 2 3 4 5
0 A B X Y Z
1 D A B K J
2 B D A A A
3 C A X Y Z