Home > Back-end >  Pandas str, int, and float columns concatenation
Pandas str, int, and float columns concatenation

Time:02-10

import pandas as pd
import numpy as np

text1 = ['22211', '1111', np.NaN]
Int1 = ['5555', np.NaN, '4444']
Float1 = [np.NaN, '3333.0', '231.0']
Text2 = ['222115555', '11113333', '4444231']

df = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1})
df_result = pd.DataFrame({'Text1': text1, 'Int1': Int1, 'Float1': Float1, 'Result': Text2})

Need to concatenate cols with NaN and different types.

Need help understanding how to do this properly. I run into it over and over again and keep having issues.

I have tried to set each column as a 'str' when using pd.read_csv or pd.read_excel

Also tried this from other questions I have seen on here.
df = df.fillna("") (so astype('int') will work)

df['Float1'] = df['Float1'].astype('int').astype('str')

Get this usually -- ValueError: invalid literal for int() with base 10: ''

The df['Results'] are usually keys in mapping dictionaries (more cols in df above) for other datasets, so getting them in str format is usually best--open to other opinions and spend hours on it.

Just looking for 'How to do this, conceptually and concretely', since I continually run into this when trying to analyze/manipulate data from our legacy excel/csv sheets.

CodePudding user response:

IIUC, you could cast to dtype str, join, then use str.replace to get rid of 'nan' values. Then perhaps use str.rstrip to get rid of the trailing ".0"s:

df['Result'] = df.astype(str).apply(''.join, axis=1).str.replace('nan','').str.rstrip('.0')

Output:

   Text1  Int1  Float1     Result
0  22211  5555     NaN  222115555
1   1111   NaN  3333.0   11113333
2    NaN  4444   231.0    4444231
  • Related