Home > Net >  Update multiple columns with both static values and values from other columns
Update multiple columns with both static values and values from other columns

Time:10-12

Let's say I have a simple dataframe like the below:

df_lst = [["A", "B", "C", "D", "E"],
          ["D", "A", "B", "K", "J"],
          ["B", "D", "A", "A", "A"],
          ["C", "A", "B", "K", "J"]]
df = pd.DataFrame(df_lst,
                  columns = ["1", "2", "3", "4", "5"])

I want to update multiple columns at the same time based on the conditions:

#valid first col:
valid_1 = ((df["1"] == "A") | (df["1"] == "C"))
#valid second col:
valid_2 = ((df["2"] == "A") | (df["2"] == "B"))

Using these conditions, I want to update the rest of the columns:

def df_iter_static(df):
    df.loc[valid_1 & valid_2, ["3", "4", "5"]] = ["X", "Y", "Z"]
    print(df)

With these static value assignments (X, Y, Z) everything works fine, I get the expected results. The first and last records are getting updated. But what if I want to make something like this?

def df_iter_multi(df):
    df.loc[valid_1 & valid_2, ["3", "4", "5"]] = [df["1"], df["2"], "X"]
    print(df)

In the above function, I want to update the 3rd and 4th columns record value with the 1st and 2nd columns record value, and for the 5th column I want to add a statix "X" value. This way this is obviously not going to work, because df["1"] and df["2"] means an entire column. So this means I would need to use df["1"].values[indexth record] but I have no idea how.

CodePudding user response:

You need to craft a 2D array, you can use something like:

def df_iter_multi(df):
    m = valid_1 & valid_2
    df.loc[m, ["3", "4", "5"]] = df[['1', '2']].assign(X='X').loc[m].values
    print(df)

df_iter_multi(df)

NB. I'm keeping your code as it is, but you should probably calculate valid_1/valid_2 in the function and not print from the function.

output:

   1  2  3  4  5
0  A  B  A  B  X
1  D  A  B  K  J
2  B  D  A  A  A
3  C  A  C  A  X

alternative:

def df_iter_multi(df):
    m = valid_1 & valid_2
    df.loc[m, ["1", "2", "3", "4", "5"]] = np.hstack([df[['1', '2']].loc[m].values,
                                            np.tile(['X', 'Y', 'Z'], (m.sum(), 1))
                                           ])
    print(df)

output:

   1  2  3  4  5
0  A  B  X  Y  Z
1  D  A  B  K  J
2  B  D  A  A  A
3  C  A  X  Y  Z
  • Related