Pandas str, int, and float columns concatenation-CodePudding

import pandas as pd
import numpy as np

text1 = ['22211', '1111', np.NaN]
Int1 = ['5555', np.NaN, '4444']
Float1 = [np.NaN, '3333.0', '231.0']
Text2 = ['222115555', '11113333', '4444231']

df = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1})
df_result = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1, 'Result': Text2})

Need to concatenate cols with NaN and different types.

Need help understanding how to do this properly. I run into it over and over again and keep having issues.

I have tried to set each column as a 'str' when using pd.read_csv or pd.read_excel

Also tried this from other questions I have seen on here.
df = df.fillna("") (so astype('int') will work)

df['Float1'] = df['Float1'].astype('int').astype('str')

Get this usually -- ValueError: invalid literal for int() with base 10: ''

The df['Results'] are usually keys in mapping dictionaries (more cols in df above) for other datasets, so getting them in str format is usually best--open to other opinions and spend hours on it.

Just looking for 'How to do this, conceptually and concretely', since I continually run into this when trying to analyze/manipulate data from our legacy excel/csv sheets.

CodePudding user response：

IIUC, you could cast to dtype str, join, then use str.replace to get rid of 'nan' values. Then perhaps use str.rstrip to get rid of the trailing ".0"s:

df['Result'] = df.astype(str).apply(''.join, axis=1).str.replace('nan','').str.rstrip('.0')

Output:

   Text1  Int1  Float1     Result
0  22211  5555     NaN  222115555
1   1111   NaN  3333.0   11113333
2    NaN  4444   231.0    4444231