I have a pandas dataframe like this
import pandas as pd
technologies = [
("Spark", 22000,'30days',1000.0),
("PySpark",25000,'50days',2300.0),
("Hadoop",23000,'55days',1500.0)
]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)
Courses Fee Duration Discount
0 Spark 22000 30days 1000.0
1 PySpark 25000 50days 2300.0
2 Hadoop 23000 55days 1500.0
I also have a json in one of the columns like this.
df['json'] = [json.dumps(x) for x in df.to_dict(orient='records')]
print(df)
Courses Fee Duration Discount json
0 Spark 22000 30days 1000.0 {"Courses": "Spark", "Fee": 22000, "Duration":...
1 PySpark 25000 50days 2300.0 {"Courses": "PySpark", "Fee": 25000, "Duration...
2 Hadoop 23000 55days 1500.0 {"Courses": "Hadoop", "Fee": 23000, "Duration"...
To the last column called json I want to add a new key. Something like this
df.apply(lambda row: json.loads(row['json'])['madeby'] = 'Bae Systems',axis=1)
^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
But i seem to have run out of luck so any ideas on this please ?
CodePudding user response:
Here's a solution using a function so our lambda does not get too long:
def add_key(data: str) -> dict:
data = json.loads(data)
data["madeby"] = "Bae systems"
return data
df["json"] = df.apply(lambda row: add_key(row["json"]), axis=1)
Courses Fee Duration Discount \
0 Spark 22000 30days 1000.0
1 PySpark 25000 50days 2300.0
2 Hadoop 23000 55days 1500.0
json
0 {'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'madeby': 'Bae systems'}
1 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'madeby': 'Bae systems'}
2 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': 1500.0, 'madeby': 'Bae systems'}