With the following df
import numpy as np
import pandas as pd
data = pd.DataFrame({'A':['Peter','Karl'], 'B':[np.nan,'Jackson']})
I want a column AddCols that combines the column A and B, sometimes they could be numbers, but even if so I want to handle with them as strings so I cast:
data['AddCols'] = data['A'].astype('string') ',' data['B'].astype('string')
But the result, whenever A or B is NAN returns a NAN:
'| | AddCols |
|---:|:-------------|
| 0 | <NA> |
| 1 | Karl,Jackson |'
Using astype(str) creates a literal string nan, that's why I am using astype('string').
Why is this happening and how to solve it? Ideally for the first row (idx 0) I would want simply:
Peter,
CodePudding user response:
Solution is replace NaN to emty strings, because if use
operator with strings NaN
is not excluded, so np.nan 'any string' = np.nan
:
data['AddCols'] = (data['A'].fillna('').astype('string') ','
data['B'].fillna('').astype('string'))
Or nae_rep
in Series.str.cat
:
data['AddCols'] = (data['A'].astype('string')
.str.cat(data['B'].fillna('').astype('string'),
na_rep='', sep=','))
print (data)
A B AddCols
0 Peter NaN Peter,
1 Karl Jackson Karl,Jackson
CodePudding user response:
From documentation, your code is behaving as expected. if you dont give a presentation of NaN alias na_rep
, and there is a NaN
, missing values in the Series/Index are omitted from the result. That is, the result of concatenating a string
to a NaN
is NaN
data['AddCols'] = data['A'].astype('string') ',' data['B'].fillna('').astype(str)