Home > other >  Prevent numpy select from casting values in the "choicelist" and "default" argum
Prevent numpy select from casting values in the "choicelist" and "default" argum

Time:07-08

I have a list of boolean pandas Series (all of the same length), and a list of values of the same length as the list of Series. I'm trying to apply numpy's select function using np.NaN as a default value to create a new Series from the result using np.NaN for rows that didn't fulfill any of the conditions. Here's what the line by itself looks like:

result = pd.Series(np.select(list_of_conditions, list_of_values, default=np.NaN))

The issue I'm having is that it looks at some point during the process all the values including np.NaN get casted to string if a single one of them is a string. Here's a minimal reproducible example:

import numpy as np
import pandas as pd

series1 = pd.Series([True, False, False])
series2 = pd.Series([True, True, False])
list_of_series = [series1, series2]
list_of_values = ['1', '2']

result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))

print(result.unique())
# Printed result --> ['1' '2' 'nan']
# Desired result --> ['1' '2' nan]

I tried (to no avail) to replace list_of_values with numpy array with object dtype:

# [...]
list_of_values = np.array(['1', 2], dtype=object)
# here I'm putting 2 instead of '2' to show that
# even items from the array get casted to string, not just the `default` argument

print(list_of_values)
# Printed result --> ['1' 2], as expected

result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))

print(result.unique())
# Printed result --> ['1' '2' 'nan'] nope

select has no dtype argument, so I don't know what to do, aside from botching a solution by replacing all 'nan' strings with actual nans, which irks me. Do I have to reimplement my own version of numpy's select or did I miss something somewhere ?

Edit: I'm using numpy version 1.21.3, and pandas version 1.3.4.

CodePudding user response:

You need to cast the default to 'object':

result = pd.Series(np.select(list_of_series, list_of_values, default=np.array(np.NaN, dtype='object')))

Result of print(result.unique()):

['1' '2' nan]

This is because the resulting dtype is created using result_type from the types of the choice list and the default: np.result_type('U1', float) yields '<U32' whereas np.result_type('U1', np.dtype('object')) yields 'O'.

  • Related