Trying to save, my new column values as np.nan but failing to apply filter-CodePudding

So my issue, is i am trying to add a new array into my df. But the array is made, by a list comprehension which basically says hey if you are not in this list, fill the list with np.nan.

unacceptable_inputs = ['Finalizar','Oi','5','Encerrar','finalizar']

comments=np.array([x if x not in unacceptable_inputs else np.NaN for x in df3['NewAction']],dtype='str')

The thing is when i try to fillter my column, by not null values. It displays the null values even tho they are filled with NaN. Could someone tell me why?

df3['Comentarios'] = comments
df3.loc[df3.Comentarios.notnull()]

Sample:

Column Comentarios:
'Potato'
 nan
'Heyo'
 nan

Wanted end result:

Column Comentarios
'Heyo'
'Potato'

Is worth noting my np.nan values from this specific column differ, from my other ones. Here is a sample for evidence. The notnull method actually works when they are filled like that

CodePudding user response：

The issue is that np.NaN is getting converted into a string and is no longer being recognized as null by pandas later on

unacceptable_inputs = ['Finalizar','Oi','5','Encerrar','finalizar']

#fake df3 table
df3 = pd.DataFrame({
    'NewAction':['ok','Finalizar','Oi','also_ok'],
})

comments=np.array([x if x not in unacceptable_inputs else np.NaN for x in df3['NewAction']],dtype='str')

#comments now looks like:
#array(['ok', 'nan', 'nan', 'also_ok'], dtype='<U7')
#
#note the single-quotes around nan! it's been turned into a string!

#type(comments[1]) is of type numpy.str_
#type(np.NaN) is of type float

#an alternative way that could fix your problem
#(1) Find all 'NewAction' indices that aren't unacceptable
#(2) Create the 'Comentarios' column using these inds
#    implicitly, all other inds in the 'Comentarios' column are NaN
acceptable_inds = ~df3['NewAction'].isin(unacceptable_inputs)
df3.loc[acceptable_inds,'Comentarios'] = df3.loc[acceptable_inds,'NewAction']

df3.loc[df3.Comentarios.notnull()]