Home > Net >  How to combine multiple columns of a pandas Dataframe into one column in JSON format
How to combine multiple columns of a pandas Dataframe into one column in JSON format

Time:02-02

I have a sample dataframe as follows:

Main Key Second Column A Column B Column C Column D Column E
First A Value 1 Value 2 Value 3 Value 4 Value 5
Second B Value 6 Value 7 Value 8 Value 9 Value 10
Third C Value 11 Value 12 Value 13 Value 14 Value 15
Fourth D Value 16 Value 17 Value 18 Value 19 Value 20

I want to make a new column called 'Aggregated Data', where I make each value in Columns A to E, as key-value pair, and combine them in 'Aggregated Data' in JSON Format

The expected output would look like this:

Main Key Second Aggregated Data
First A {"Column A":"Value 1","Column B":"Value 2","Column C":"Value 3","Column D":"Value 4","Column E":"Value 5"}
Second B {"Column A":"Value 6","Column B":"Value 7","Column C":"Value 8","Column D":"Value 9","Column E":"Value 10"}
Third C {"Column A":"Value 11","Column B":"Value 12","Column C":"Value 13","Column D":"Value 14","Column E":"Value 15"}
Fourth D {"Column A":"Value 16","Column B":"Value 17","Column C":"Value 18","Column D":"Value 19","Column E":"Value 20"}

Any idea how this can be achieved? Thanks

CodePudding user response:

Via intermediate pandas.DataFrame.to_dict call (with orient records to obtain lists like [{column -> value}, … , {column -> value}]):

df[['Main Key', 'Second']].assign(Aggregated_Data=df.set_index(['Main Key', 'Second']).to_dict(orient='records'))

  Main Key Second                                    Aggregated_Data
0   First      A   {'Column A': 'Value 1 ', 'Column B': 'Value 2 ...
1  Second      B   {'Column A': 'Value 6 ', 'Column B': 'Value 7 ...
2   Third      C   {'Column A': 'Value 11 ', 'Column B': 'Value 1...
3  Fourth      D   {'Column A': 'Value 16 ', 'Column B': 'Value 1...

CodePudding user response:

Just skip the first two columns and call to_json :

out = (df[["Main Key", "Second"]]
       .assign(Aggregated_Data= df.iloc[:, 2:]
                                  .apply(lambda x: x.to_json(), axis=1))

Alternatively, use a dict/listcomp :

df["Aggregated_Data"] = [{k: v for k, v in zip(df.columns[2:], v)}
                         for v in df.iloc[:,2:].to_numpy()]

Output :

print(out)

  Main Key Second                                    Aggregated_Data
0    First      A  {"Column A":"Value 1","Column B":"Value 2","Co...
1   Second      B  {"Column A":"Value 6","Column B":"Value 7","Co...
2    Third      C  {"Column A":"Value 11","Column B":"Value 12","...
3   Fourth      D  {"Column A":"Value 16","Column B":"Value 17","...
  • Related