Numpy's where function and length error message-CodePudding

I have a spreadsheet that I am trying to correct. On the Billing Categorization, it should be filled with 'Standard' or 'Non Standard' as applies.

I am trying to use the where function from numpy to do this:

df['Billing Categorization'] = np.where((df['Billing Categorization'].isnull(), ~df['AE Number'].isnull()), 'Standard', df['Billing Categorization'])

The idea is that the the empty values in Billing Categorization should be filled with "Standard" where in the same row the value in column 'AE Number' isn't empty.

However, I am getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-64-863f807f354c> in <module>
     30 df.loc[df["PQC-Product"].isnull(),'PQC-Product'] = df["Request-Product"]
     31 
---> 32 df['Billing Categorization'] = np.where((df['Billing Categorization'].isnull(), ~df['AE Number'].isnull()), 'Standard', df['Billing Categorization'])
     33 
     34 #We simply get the data out

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3161         else:
   3162             # set column
-> 3163             self._set_item(key, value)
   3164 
   3165     def _setitem_slice(self, key: slice, value):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3240         """
   3241         self._ensure_valid_index(value)
-> 3242         value = self._sanitize_column(key, value)
   3243         NDFrame._set_item(self, key, value)
   3244 

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3897 
   3898             # turn me into an ndarray
-> 3899             value = sanitize_index(value, self.index)
   3900             if not isinstance(value, (np.ndarray, Index)):
   3901                 if isinstance(value, list) and len(value) > 0:

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index)
    749     """
    750     if len(data) != len(index):
--> 751         raise ValueError(
    752             "Length of values "
    753             f"({len(data)}) "

ValueError: Length of values (2) does not match length of index (876)

Both columns have empty values, but I just want to fill those that applies. Obviously not all of them will be possible. I want to go from this:

Number	Billing Categorization	Country	AE Number	AE country	Date
First	NaN	Italy	55568	Italy	1-Jan-2022
Second	NaN	France	NaN	NaN	NaN
Third	Standard	Spain	85968	Spain	5-Jan-2022
Fourth	Non Standard	UK	748265	UK	5-Jan-2022
Fifth	Standard	UK	59632	UK	6-Jan-2022
Sixth	NaN	UK	78946	UK	7-Jan-22

To this one:

Number	Billing Categorization	Country	AE Number	AE country	Date
First	Standard	Italy	55568	Italy	1-Jan-2022
Second	NaN	France	NaN	NaN	NaN
Third	Standard	Spain	85968	Spain	5-Jan-2022
Fourth	Non Standard	UK	748265	UK	5-Jan-2022
Fifth	Standard	UK	59632	UK	6-Jan-2022
Sixth	Standard	UK	78946	UK	7-Jan-22

As you can see on the second row, as there is no AE Number, where shouldn't change anything, as this should stay blank. I have manually checked the length of both columns and they match, so what's wrong?

CodePudding user response：

IIUC chain masks by &:

m = df['Billing Categorization'].isna() & df['AE Number'].notna()
df['Billing Categorization'] = np.where(m, 'Standard', df['Billing Categorization'])

CodePudding user response：

You don't need np.where here, use indexing instead:

df[df['Billing Categorization'].isna() & df['AE Number'].notna()] = 'Standard'

Output:

Number	Billing Categorization	Country	AE Number	AE country	Date
Standard	Standard	Standard	Standard	Standard	Standard
Second	nan	France	nan	nan	nan
Third	Standard	Spain	85968	Spain	5-Jan-2022
Fourth	Non Standard	UK	748265	UK	5-Jan-2022
Fifth	Standard	UK	59632	UK	6-Jan-2022
Standard	Standard	Standard	Standard	Standard	Standard