Home > OS >  Haw can I fill two columns of a dataframe with "np.where"?
Haw can I fill two columns of a dataframe with "np.where"?

Time:11-13

I am trying to set 2 columns by a condition on a 3rd column. I can set 1 column conditions on another column, and I can set 2 columns on a single condition value. but when I try to set 2 columns by a condition on a column, it fails.

here is the code example:

    import pandas as pd
    import numpy as np
    AAA={"column A": [1, 1, 1, 2, 2, 2, 3, 3, 3]}
    df = pd.DataFrame(AAA)

if I call:

    df["column B"], df["column C"] = np.where(True ,['4', '8'],['NaN', 'NaN'])

I get:

df
   column A column B column C
0         1        4        8
1         1        4        8
2         1        4        8
3         2        4        8
4         2        4        8
5         2        4        8
6         3        4        8
7         3        4        8
8         3        4        8

so I know I can fill 2 columns based on a condition.

if I call:

    df["column B"] = np.where( df["column A"] == 2 ,['4'],['NaN'])

I get:

   column A column B column C
0         1      NaN        8
1         1      NaN        8
2         1      NaN        8
3         2        4        8
4         2        4        8
5         2        4        8
6         3      NaN        8
7         3      NaN        8
8         3      NaN        8

so I know I can fill based on a condition on a column. (I assume this is treated as a boolean array ) However, If I call:

    df["column B"], df["column C"] = np.where( df["column A"] == 2 ,['4', '8'],['NaN', 'NaN'])

I expect to get

   column A column B column C
0         1      NaN        NaN        
1         1      NaN        NaN        
2         1      NaN        NaN        
3         2        4        8
4         2        4        8
5         2        4        8
6         3      NaN        NaN        
7         3      NaN        NaN        
8         3      NaN        NaN        

but I actually get:

Traceback (most recent call last):
 ... pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<string>", line 2, in <module>
  File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (9,) (2,) (2,) 

Is there a way to do what I am trying to do? I don't want to use 2 separate calls, because the dataframes I need this for are very large.

CodePudding user response:

Maybe you can loop outside np.where:

df["column B"], df["column C"] = [np.where( df["column A"] == 2 ,true_val,'NaN') for true_val in ['4','8']]

print(df)
# column A column B column C
# 0         1      NaN      NaN
# 1         1      NaN      NaN
# 2         1      NaN      NaN
# 3         2        4        8
# 4         2        4        8
# 5         2        4        8
# 6         3      NaN      NaN
# 7         3      NaN      NaN
# 8         3      NaN      NaN

CodePudding user response:

use loc indexer and give value

df.loc[df['column A'] == 2, ['column B', 'column C']] = [4, 8]

output(df):

    column A    column B    column C
0   1           NaN         NaN
1   1           NaN         NaN
2   1           NaN         NaN
3   2           4.0         8.0
4   2           4.0         8.0
5   2           4.0         8.0
6   3           NaN         NaN
7   3           NaN         NaN
8   3           NaN         NaN

CodePudding user response:

Here is one way. While it is not the most elegant code - it should help you understand what is required.

import pandas as pd
import numpy as np

AAA={"column A": [1, 1, 1, 2, 2, 2, 3, 3, 3]}
df = pd.DataFrame(AAA)

col_length = len(df['column A'])
fours = np.repeat(4, col_length, axis =0)
eights = np.repeat(8, col_length, axis =0)
empties = np.repeat(np.nan, col_length, axis =0)

df["column B"], df["column C"] = np.where( df["column A"] == 2 ,[fours, eights], [empties, empties])
print(df)

OUTPUT:

   column A  column B  column C
0         1       NaN       NaN
1         1       NaN       NaN
2         1       NaN       NaN
3         2       4.0       8.0
4         2       4.0       8.0
5         2       4.0       8.0
6         3       NaN       NaN
7         3       NaN       NaN
8         3       NaN       NaN
  • Related