I have a list of boolean pandas Series (all of the same length), and a list of values of the same length as the list of Series. I'm trying to apply numpy's select function using np.NaN
as a default value to create a new Series from the result using np.NaN
for rows that didn't fulfill any of the conditions. Here's what the line by itself looks like:
result = pd.Series(np.select(list_of_conditions, list_of_values, default=np.NaN))
The issue I'm having is that it looks at some point during the process all the values including np.NaN
get casted to string if a single one of them is a string.
Here's a minimal reproducible example:
import numpy as np
import pandas as pd
series1 = pd.Series([True, False, False])
series2 = pd.Series([True, True, False])
list_of_series = [series1, series2]
list_of_values = ['1', '2']
result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))
print(result.unique())
# Printed result --> ['1' '2' 'nan']
# Desired result --> ['1' '2' nan]
I tried (to no avail) to replace list_of_values
with numpy array with object
dtype:
# [...]
list_of_values = np.array(['1', 2], dtype=object)
# here I'm putting 2 instead of '2' to show that
# even items from the array get casted to string, not just the `default` argument
print(list_of_values)
# Printed result --> ['1' 2], as expected
result = pd.Series(np.select(list_of_series, list_of_values, default=np.NaN))
print(result.unique())
# Printed result --> ['1' '2' 'nan'] nope
select
has no dtype argument, so I don't know what to do, aside from botching a solution by replacing all 'nan'
strings with actual nan
s, which irks me. Do I have to reimplement my own version of numpy's select or did I miss something somewhere ?
Edit: I'm using numpy version 1.21.3, and pandas version 1.3.4.
CodePudding user response:
You need to cast the default to 'object'
:
result = pd.Series(np.select(list_of_series, list_of_values, default=np.array(np.NaN, dtype='object')))
Result of print(result.unique())
:
['1' '2' nan]
This is because the resulting dtype is created using result_type
from the types of the choice list and the default: np.result_type('U1', float)
yields '<U32'
whereas np.result_type('U1', np.dtype('object'))
yields 'O'
.