Using pandas 1.42
Having a DataFrame with 5 columns: A, B, C, D, E
I need to assign values from columns D and E to columns A and B if the value of column C is true.
I want to achieve this in one line using the .loc
method.
example
A | B | C | D | E |
---|---|---|---|---|
1 | 4 | True | 7 | 10 |
2 | 5 | False | 8 | 11 |
3 | 6 | True | 9 | 12 |
expected result
A | B | C | D | E |
---|---|---|---|---|
7 | 10 | True | 7 | 10 |
2 | 5 | False | 8 | 11 |
9 | 12 | True | 9 | 12 |
df = pd.DataFrame(
{'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [True, False, True],
'D': [7, 8, 9],
'E': [10, 11, 12]}
)
df.loc[df['C'], ['A', 'B']] = df[['D', 'E']]
actual result
A | B | C | D | E |
---|---|---|---|---|
nan | nan | True | 7 | 10 |
2 | 5 | False | 8 | 11 |
nan | nan | True | 9 | 12 |
workaround I figured
df.loc[df['C'], ['A', 'B']] = (df.D[df.C], df.E[df.C])
Seems pandas not getting right the to be assigned values if they come in form of a DataFrame, but it gets it right if you pack it nicely as tuple of Series. Do I get the syntax wrong or is it a bug in pandas?
CodePudding user response:
Use boolean indexing on both sides, and remove index alignment by converting to_numpy
array:
m = df['C']
df.loc[m, ['A', 'B']] = df.loc[m, ['D', 'E']].to_numpy()
Or change the column names with set_axis
:
df.loc[df['C'], ['A', 'B']] = df[['D', 'E']].set_axis(['A', 'B'], axis=1)
Output:
A B C D E
0 7 10 True 7 10
1 2 5 False 8 11
2 9 12 True 9 12