Home > front end >  Concatenate all columns in a dataframe, including column name and delimiters
Concatenate all columns in a dataframe, including column name and delimiters

Time:05-28

Using Python I am importing a large dataset into a Pandas Dataframe. These datasets are 10k rows and 1k columns. I need to concatenate all of the column values along with the column name and other delimiters. I have used a For loop to solve this, and it works fine, but is VERY slow and too many rows causes out of memory errors.

col 1     col 2     col 3...
val1      val2      val3 ...
val4      val5      val6

into

ConcatValues
Col 1=val1|Col 2=val2|Col 3=val3...
Col 1=val4|Col 2=val5|Col 3=val6...

My current solution looks like this, that adds a new column, "ConcatValues":

cols = df.columns
for col in cols[1:]:
     df["ConcatValues"] = df["ConcatValues"].astype(str)   "|"   col   "="    df[col].astype(str)

I have tried to get something like this working, using Apply and lambda:

df["ConcatValues"] = df[cols].apply(lambda x : str(x.name) '=' x "|",axis=1)

But this isnt concatenating the columns, just the values.

I have read several posts on similar problems, but none of those involved concatenating the column names or other characters along with the values.

Any help would be appreciated.

CodePudding user response:

You can use:

df['ConcatValues'] = df.apply(lambda c: f'{c.name} = ' c).agg('|'.join, axis=1)

output:

  col 1 col 2 col 3                            ConcatValues
0  val1  val2  val3  col 1 = val1|col 2 = val2|col 3 = val3
1  val4  val5  val6  col 1 = val4|col 2 = val5|col 3 = val6

Or, as new DataFrame:

df2 = (df.apply(lambda c: f'{c.name} = ' c)
         .agg('|'.join, axis=1)
         .to_frame('ConcatValues')
       )

output:

                             ConcatValues
0  col 1 = val1|col 2 = val2|col 3 = val3
1  col 1 = val4|col 2 = val5|col 3 = val6

CodePudding user response:

IIUC this should get you what you are expecting

data = {
    'Column1' : ['1', '2', '3', '4'],
    'Column2' : ['1', '2', '3', '4'],
    'Column3' : ['1', '2', '3', '4']
}
df = pd.DataFrame(data)
for x in df.columns:
    df[x] = df[x].apply(lambda row : f"{x}={str(y)}")
df['merge_column'] = df.apply(lambda x : '|'.join(x), axis = 1)
df = df[['merge_column']]
df.to_csv('WheverEver you want it to go', index = False, header=False)
df
  • Related