I have dataframe with the following columns:
ID A1 B1 C1 A2 B2 C2 A3 B3 C4
AA 1 3 6 4 0 6
BB 5 5 4 6 7 9
CC 5 5 5
I want to create a new column called Z that takes each row, groups them into a JSON list of records, and renames the column as their key. After the JSON column is constructed, I want to drop all the columns and keep Z and ID only.
Here is the output desired:
ID Z
AA [{"New Column Name A":1, "New Column Name B":3,"New Column Name C":6},{"New Column Name A":4, "New Column Name B":0,"New Column Name C":6}]
BB [{"New Column Name A":5, "New Column Name B":5,"New Column Name C":4},{"New Column Name A":6, "New Column Name B":7,"New Column Name C":9}]
CC [{"New Column Name A":5, "New Column Name B":5,"New Column Name C":5}]
I've tried to group the fields together, apply a lambda function, and use to_dic('records'), but I'm not getting the output I want. I'm not even sure if my approach is correct. Any help is greatly appreciated.
CodePudding user response:
Have you tried to go line by line?? I am not very good with pandas and python. But I have me this code. Hope it works for you.
toAdd = []
for row in dataset.values:
toAddLine = {}
i = 0
for data in row:
if data is not None:
toAddLine["New Column Name " dataset.columns[i]] = data
i = i 1
toAdd.append(toAddLine)
dataset['Z'] = toAdd
dataset['Z']
CodePudding user response:
# create a columns name map for chang related column
columns = dataset.columns
columns_map = {}
for i in columns:
columns_map[i] = f"new {i}"
def change_row_to_json(row):
new_dict = {}
for index, value in enumerate(row):
new_dict[columns_map[columns[index]]] = value
return json.dumps(new_dict, indent = 4)
dataset.loc[:,'Z'] = dataset.apply(change_row_to_json, axis=1)
dataset= dataset[["ID", "Z"]]