I need to concatenate some columns in a pandas DataFrame with "_" as separator and store the result in a new column in the same DataFrame. The problem is that I don't know in advance which and how many columns to concatenate. The labels of the columns to be concatenated are determined at run time of the program and stored in a list.
Example:
import pandas as pd
df=pd.DataFrame(data={'col.a':['a','b','c'],'col.b':['d','e','f'], 'col.c':['g','h','i']})
col.a col.b col.c
0 a d g
1 b e h
2 c f i
cols_to_concat = ['col.a','col.c']
Desired result:
col.a col.b col.c cols.concat
0 a d g a_g
1 b e h b_h
2 c f i c_i
I need a method for generating df['cols.concat'] that works for a df with any number of columns and where cols_to_concat is an arbitrary subset of df.columns.
CodePudding user response:
You can use the below, just add the columns, with the _ string in the middle:
df['cols.concat'] = df['col.a'] '_' df['col.c']
CodePudding user response:
supposing you have a list with column names to concatenate you could use apply
and concatenate values as:
import pandas as pd
df=pd.DataFrame(data={'col.a':['a','b','c'],
'col.b':['d','e','f'],
'col.c':['g','h','i']})
#this is the list of columns to concatenate
cols_to_cat = ['col.a','col.b','col.c']
df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)
this should do the trick.
EDIT
you could concatenate any number of columns with this:
cols_to_cat = ['col.a','col.c']
df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)
You could even repeat columns:
cols_to_cat = ['col.a','col.c','col.a']
df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)