Home > Blockchain >  python: concatenate unknown number of columns in pandas DataFrame
python: concatenate unknown number of columns in pandas DataFrame

Time:12-10

I need to concatenate some columns in a pandas DataFrame with "_" as separator and store the result in a new column in the same DataFrame. The problem is that I don't know in advance which and how many columns to concatenate. The labels of the columns to be concatenated are determined at run time of the program and stored in a list.

Example:

import pandas as pd

df=pd.DataFrame(data={'col.a':['a','b','c'],'col.b':['d','e','f'], 'col.c':['g','h','i']})

  col.a col.b col.c
0     a     d     g
1     b     e     h
2     c     f     i

cols_to_concat = ['col.a','col.c']

Desired result:

  col.a col.b col.c cols.concat
0     a     d     g         a_g
1     b     e     h         b_h
2     c     f     i         c_i

I need a method for generating df['cols.concat'] that works for a df with any number of columns and where cols_to_concat is an arbitrary subset of df.columns.

CodePudding user response:

You can use the below, just add the columns, with the _ string in the middle:

df['cols.concat'] = df['col.a'] '_' df['col.c']

CodePudding user response:

supposing you have a list with column names to concatenate you could use apply and concatenate values as:


import pandas as pd

df=pd.DataFrame(data={'col.a':['a','b','c'],
                      'col.b':['d','e','f'], 
                      'col.c':['g','h','i']})


#this is the list of columns to concatenate
cols_to_cat = ['col.a','col.b','col.c']


df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)

this should do the trick.


EDIT

you could concatenate any number of columns with this:

cols_to_cat = ['col.a','col.c']

df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)

You could even repeat columns:

cols_to_cat = ['col.a','col.c','col.a']

df['concat'] = df[cols_to_cat].apply(lambda x: '_'.join(x), axis=1)
  • Related