Using Python I am importing a large dataset into a Pandas Dataframe. These datasets are 10k rows and 1k columns. I need to concatenate all of the column values along with the column name and other delimiters. I have used a For loop to solve this, and it works fine, but is VERY slow and too many rows causes out of memory errors.
col 1 col 2 col 3...
val1 val2 val3 ...
val4 val5 val6
into
ConcatValues
Col 1=val1|Col 2=val2|Col 3=val3...
Col 1=val4|Col 2=val5|Col 3=val6...
My current solution looks like this, that adds a new column, "ConcatValues":
cols = df.columns
for col in cols[1:]:
df["ConcatValues"] = df["ConcatValues"].astype(str) "|" col "=" df[col].astype(str)
I have tried to get something like this working, using Apply and lambda:
df["ConcatValues"] = df[cols].apply(lambda x : str(x.name) '=' x "|",axis=1)
But this isnt concatenating the columns, just the values.
I have read several posts on similar problems, but none of those involved concatenating the column names or other characters along with the values.
Any help would be appreciated.
CodePudding user response:
You can use:
df['ConcatValues'] = df.apply(lambda c: f'{c.name} = ' c).agg('|'.join, axis=1)
output:
col 1 col 2 col 3 ConcatValues
0 val1 val2 val3 col 1 = val1|col 2 = val2|col 3 = val3
1 val4 val5 val6 col 1 = val4|col 2 = val5|col 3 = val6
Or, as new DataFrame:
df2 = (df.apply(lambda c: f'{c.name} = ' c)
.agg('|'.join, axis=1)
.to_frame('ConcatValues')
)
output:
ConcatValues
0 col 1 = val1|col 2 = val2|col 3 = val3
1 col 1 = val4|col 2 = val5|col 3 = val6
CodePudding user response:
IIUC this should get you what you are expecting
data = {
'Column1' : ['1', '2', '3', '4'],
'Column2' : ['1', '2', '3', '4'],
'Column3' : ['1', '2', '3', '4']
}
df = pd.DataFrame(data)
for x in df.columns:
df[x] = df[x].apply(lambda row : f"{x}={str(y)}")
df['merge_column'] = df.apply(lambda x : '|'.join(x), axis = 1)
df = df[['merge_column']]
df.to_csv('WheverEver you want it to go', index = False, header=False)
df