How to combine multiple columns of a pandas Dataframe into one column in JSON format-CodePudding

I have a sample dataframe as follows:

Main Key	Second	Column A	Column B	Column C	Column D	Column E
First	A	Value 1	Value 2	Value 3	Value 4	Value 5
Second	B	Value 6	Value 7	Value 8	Value 9	Value 10
Third	C	Value 11	Value 12	Value 13	Value 14	Value 15
Fourth	D	Value 16	Value 17	Value 18	Value 19	Value 20

I want to make a new column called 'Aggregated Data', where I make each value in Columns A to E, as key-value pair, and combine them in 'Aggregated Data' in JSON Format

The expected output would look like this:

Main Key	Second	Aggregated Data
First	A	{"Column A":"Value 1","Column B":"Value 2","Column C":"Value 3","Column D":"Value 4","Column E":"Value 5"}
Second	B	{"Column A":"Value 6","Column B":"Value 7","Column C":"Value 8","Column D":"Value 9","Column E":"Value 10"}
Third	C	{"Column A":"Value 11","Column B":"Value 12","Column C":"Value 13","Column D":"Value 14","Column E":"Value 15"}
Fourth	D	{"Column A":"Value 16","Column B":"Value 17","Column C":"Value 18","Column D":"Value 19","Column E":"Value 20"}

Any idea how this can be achieved? Thanks

CodePudding user response：

Via intermediate pandas.DataFrame.to_dict call (with orient records to obtain lists like [{column -> value}, … , {column -> value}]):

df[['Main Key', 'Second']].assign(Aggregated_Data=df.set_index(['Main Key', 'Second']).to_dict(orient='records'))

  Main Key Second                                    Aggregated_Data
0   First      A   {'Column A': 'Value 1 ', 'Column B': 'Value 2 ...
1  Second      B   {'Column A': 'Value 6 ', 'Column B': 'Value 7 ...
2   Third      C   {'Column A': 'Value 11 ', 'Column B': 'Value 1...
3  Fourth      D   {'Column A': 'Value 16 ', 'Column B': 'Value 1...

CodePudding user response：

Just skip the first two columns and call to_json :

out = (df[["Main Key", "Second"]]
       .assign(Aggregated_Data= df.iloc[:, 2:]
                                  .apply(lambda x: x.to_json(), axis=1))

Alternatively, use a dict/listcomp :

df["Aggregated_Data"] = [{k: v for k, v in zip(df.columns[2:], v)}
                         for v in df.iloc[:,2:].to_numpy()]

Output :

print(out)

  Main Key Second                                    Aggregated_Data
0    First      A  {"Column A":"Value 1","Column B":"Value 2","Co...
1   Second      B  {"Column A":"Value 6","Column B":"Value 7","Co...
2    Third      C  {"Column A":"Value 11","Column B":"Value 12","...
3   Fourth      D  {"Column A":"Value 16","Column B":"Value 17","...