So my issue, is i am trying to add a new array into my df. But the array is made, by a list comprehension which basically says hey if you are not in this list, fill the list with np.nan.
unacceptable_inputs = ['Finalizar','Oi','5','Encerrar','finalizar']
comments=np.array([x if x not in unacceptable_inputs else np.NaN for x in df3['NewAction']],dtype='str')
The thing is when i try to fillter my column, by not null values. It displays the null values even tho they are filled with NaN. Could someone tell me why?
df3['Comentarios'] = comments
df3.loc[df3.Comentarios.notnull()]
Sample:
Column Comentarios:
'Potato'
nan
'Heyo'
nan
Wanted end result:
Column Comentarios
'Heyo'
'Potato'
Is worth noting my np.nan values from this specific column differ, from my other ones. Here is a sample for evidence. The notnull method actually works when they are filled like that
CodePudding user response:
The issue is that np.NaN
is getting converted into a string and is no longer being recognized as null by pandas later on
unacceptable_inputs = ['Finalizar','Oi','5','Encerrar','finalizar']
#fake df3 table
df3 = pd.DataFrame({
'NewAction':['ok','Finalizar','Oi','also_ok'],
})
comments=np.array([x if x not in unacceptable_inputs else np.NaN for x in df3['NewAction']],dtype='str')
#comments now looks like:
#array(['ok', 'nan', 'nan', 'also_ok'], dtype='<U7')
#
#note the single-quotes around nan! it's been turned into a string!
#type(comments[1]) is of type numpy.str_
#type(np.NaN) is of type float
#an alternative way that could fix your problem
#(1) Find all 'NewAction' indices that aren't unacceptable
#(2) Create the 'Comentarios' column using these inds
# implicitly, all other inds in the 'Comentarios' column are NaN
acceptable_inds = ~df3['NewAction'].isin(unacceptable_inputs)
df3.loc[acceptable_inds,'Comentarios'] = df3.loc[acceptable_inds,'NewAction']
df3.loc[df3.Comentarios.notnull()]