I have a sample dataframe as follows:
Main Key | Second | Column A | Column B | Column C | Column D | Column E |
---|---|---|---|---|---|---|
First | A | Value 1 | Value 2 | Value 3 | Value 4 | Value 5 |
Second | B | Value 6 | Value 7 | Value 8 | Value 9 | Value 10 |
Third | C | Value 11 | Value 12 | Value 13 | Value 14 | Value 15 |
Fourth | D | Value 16 | Value 17 | Value 18 | Value 19 | Value 20 |
I want to make a new column called 'Aggregated Data', where I make each value in Columns A to E, as key-value pair, and combine them in 'Aggregated Data' in JSON Format
The expected output would look like this:
Main Key | Second | Aggregated Data |
---|---|---|
First | A | {"Column A":"Value 1","Column B":"Value 2","Column C":"Value 3","Column D":"Value 4","Column E":"Value 5"} |
Second | B | {"Column A":"Value 6","Column B":"Value 7","Column C":"Value 8","Column D":"Value 9","Column E":"Value 10"} |
Third | C | {"Column A":"Value 11","Column B":"Value 12","Column C":"Value 13","Column D":"Value 14","Column E":"Value 15"} |
Fourth | D | {"Column A":"Value 16","Column B":"Value 17","Column C":"Value 18","Column D":"Value 19","Column E":"Value 20"} |
Any idea how this can be achieved? Thanks
CodePudding user response:
Via intermediate pandas.DataFrame.to_dict
call (with orient records
to obtain lists like [{column -> value}, … , {column -> value}]
):
df[['Main Key', 'Second']].assign(Aggregated_Data=df.set_index(['Main Key', 'Second']).to_dict(orient='records'))
Main Key Second Aggregated_Data
0 First A {'Column A': 'Value 1 ', 'Column B': 'Value 2 ...
1 Second B {'Column A': 'Value 6 ', 'Column B': 'Value 7 ...
2 Third C {'Column A': 'Value 11 ', 'Column B': 'Value 1...
3 Fourth D {'Column A': 'Value 16 ', 'Column B': 'Value 1...
CodePudding user response:
Just skip the first two columns and call to_json
:
out = (df[["Main Key", "Second"]]
.assign(Aggregated_Data= df.iloc[:, 2:]
.apply(lambda x: x.to_json(), axis=1))
Alternatively, use a dict/listcomp :
df["Aggregated_Data"] = [{k: v for k, v in zip(df.columns[2:], v)}
for v in df.iloc[:,2:].to_numpy()]
Output :
print(out)
Main Key Second Aggregated_Data
0 First A {"Column A":"Value 1","Column B":"Value 2","Co...
1 Second B {"Column A":"Value 6","Column B":"Value 7","Co...
2 Third C {"Column A":"Value 11","Column B":"Value 12","...
3 Fourth D {"Column A":"Value 16","Column B":"Value 17","...