Home > database >  pandas concat dataframe by converting all columns to json
pandas concat dataframe by converting all columns to json

Time:06-28

I have a pandas dataframe like this

import pandas as pd                                                                         
technologies = [                                                                            
            ("Spark", 22000,'30days',1000.0),                                               
            ("PySpark",25000,'50days',2300.0),                                              
            ("Hadoop",23000,'55days',1500.0)                                                
            ]                                                                               
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])           
print(df) 

   Courses    Fee Duration  Discount
0    Spark  22000   30days    1000.0
1  PySpark  25000   50days    2300.0
2   Hadoop  23000   55days    1500.0

I want the output to be converted to the following.

Courses    Fee Duration  Discount       json
  Spark  22000   30days    1000.0       {"Courses":"Spark","Fee":22000,"Duration":"30days","Discount":1000.0}
PySpark  25000   50days    2300.0       {"Courses":"PySpark","Fee":25000,"Duration":"50days","Discount":2300.0}
 Hadoop  23000   55days    1500.0       {"Courses":"Hadoop","Fee":23000,"Duration":"55days","Discount":1500.0}
                                                                              

CodePudding user response:

You can apply a lambda function to create the new json column

import json
import pandas as pd

df['json'] = df.apply(lambda row: json.dumps(dict(row)), axis=1)

Alternatively, if you want to get the entire DataFrame as a list of dicts you may use

df.to_dict(orient='records')

PS: wrap your lambda operation with json.dumps() as necessary (depending on if you want the output as dict or JSON)

CodePudding user response:

Because need json need convert dictionary from DataFrame.to_dict to json by json.dumps:

import json

df['dict'] = df.to_dict(orient='records')

df['json'] = [json.dumps(x) for x in df.to_dict(orient='records')]
print (df)
   Courses    Fee Duration  Discount  \
0    Spark  22000   30days    1000.0   
1  PySpark  25000   50days    2300.0   
2   Hadoop  23000   55days    1500.0   

                                                dict  \
0  {'Courses': 'Spark', 'Fee': 22000, 'Duration':...   
1  {'Courses': 'PySpark', 'Fee': 25000, 'Duration...   
2  {'Courses': 'Hadoop', 'Fee': 23000, 'Duration'...   

                                                json  
0  {"Courses": "Spark", "Fee": 22000, "Duration":...  
1  {"Courses": "PySpark", "Fee": 25000, "Duration...  
2  {"Courses": "Hadoop", "Fee": 23000, "Duration"...  

print (type(df.loc[0, 'dict']))
<class 'dict'>

print (type(df.loc[0, 'json']))
<class 'str'>
  • Related