Home > OS >  add a key to a pandas dataframe where the column value is json
add a key to a pandas dataframe where the column value is json

Time:07-18

I have a pandas dataframe like this

import pandas as pd                                                                         
technologies = [                                                                            
            ("Spark", 22000,'30days',1000.0),                                               
            ("PySpark",25000,'50days',2300.0),                                              
            ("Hadoop",23000,'55days',1500.0)                                                
            ]                                                                               
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])           
print(df)



   Courses    Fee Duration  Discount
0    Spark  22000   30days    1000.0
1  PySpark  25000   50days    2300.0
2   Hadoop  23000   55days    1500.0

I also have a json in one of the columns like this.

df['json'] = [json.dumps(x) for x in df.to_dict(orient='records')]

print(df)

   Courses    Fee Duration  Discount  json
0    Spark  22000   30days    1000.0  {"Courses": "Spark", "Fee": 22000, "Duration":...
1  PySpark  25000   50days    2300.0  {"Courses": "PySpark", "Fee": 25000, "Duration...
2   Hadoop  23000   55days    1500.0  {"Courses": "Hadoop", "Fee": 23000, "Duration"...

To the last column called json I want to add a new key. Something like this

   df.apply(lambda row: json.loads(row['json'])['madeby'] = 'Bae Systems',axis=1)
             ^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

But i seem to have run out of luck so any ideas on this please ?

CodePudding user response:

Here's a solution using a function so our lambda does not get too long:

def add_key(data: str) -> dict:
    data = json.loads(data)
    data["madeby"] = "Bae systems"
    return data

df["json"] = df.apply(lambda row: add_key(row["json"]), axis=1)
   Courses    Fee Duration  Discount  \
0    Spark  22000   30days    1000.0   
1  PySpark  25000   50days    2300.0   
2   Hadoop  23000   55days    1500.0   

                                                                                                      json  
0    {'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'madeby': 'Bae systems'}  
1  {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'madeby': 'Bae systems'}  
2   {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': 1500.0, 'madeby': 'Bae systems'}  
  • Related