I am trying to add a new column at the end of my pandas dataframe that will contain the values of previous cells in key:value pair. I have tried the following:
import json
df["json_formatted"] = df.apply
(
lambda row: json.dumps(row.to_dict(), ensure_ascii=False), axis=1
)
It creates the the column json_formatted
successfully with all required data, but the problem is it also adds the json_formatted
as another extra key. I don't want that. I want the json data to contain only the information from the original df columns. How can I do that?
Note: I made ensure_ascii=False
because the column names are in Japanese characters.
CodePudding user response:
Create a new variable holding the created column and add it afterwards:
json_formatted = df.apply(lambda row: json.dumps(row.to_dict(), ensure_ascii=False), axis=1)
df['json_formatted'] = json_formatted
CodePudding user response:
This behaviour shouldn't happen, but might be caused by your having run this function more than once. (You added the column, and then ran
df.apply
on the same dataframe).You can avoid this by making your columns explicit:
df[['col1', 'col2']].apply()
Apply is an expensive operation is Pandas, and if performance matters it is better to avoid it. An alternative way to do this is
df["json_formatted"] = [json.dumps(s, ensure_ascii=False) for s in df.T.to_dict().values()]