import pandas as pd
import numpy as np
text1 = ['22211', '1111', np.NaN]
Int1 = ['5555', np.NaN, '4444']
Float1 = [np.NaN, '3333.0', '231.0']
Text2 = ['222115555', '11113333', '4444231']
df = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1})
df_result = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1, 'Result': Text2})
Need to concatenate cols with NaN and different types.
Need help understanding how to do this properly. I run into it over and over again and keep having issues.
I have tried to set each column as a 'str'
when using pd.read_csv
or pd.read_excel
Also tried this from other questions I have seen on here.
df = df.fillna("")
(so astype('int') will work)
df['Float1'] = df['Float1'].astype('int').astype('str')
Get this usually -- ValueError: invalid literal for int() with base 10: ''
The df['Results']
are usually keys in mapping dictionaries (more cols in df above) for other datasets, so getting them in str format
is usually best--open to other opinions and spend hours on it.
Just looking for 'How to do this, conceptually and concretely', since I continually run into this when trying to analyze/manipulate data from our legacy excel/csv sheets.
CodePudding user response:
IIUC, you could cast to dtype str
, join
, then use str.replace
to get rid of 'nan'
values. Then perhaps use str.rstrip
to get rid of the trailing ".0"s:
df['Result'] = df.astype(str).apply(''.join, axis=1).str.replace('nan','').str.rstrip('.0')
Output:
Text1 Int1 Float1 Result
0 22211 5555 NaN 222115555
1 1111 NaN 3333.0 11113333
2 NaN 4444 231.0 4444231