pandas join two json columns and combine them but results in a key error-CodePudding

I am attempting to join two json columns using Pandas.

import pandas as pd

students_df = pd.DataFrame([['Jay',16,'Soccer'],['Jay',26,'Soccer'],
                   ['Jack',19, 'FootBall'],
                   ['Dorsey',19, 'Dining'],
                   ['Mark',18, 'Swimming']],
                  columns = ['Name','Age', 'Sport'])
                  
courses_df = pd.DataFrame([['Jay','MS'],
                   ['Jay','Music'],
                   ['Dorsey','Music'],
                   ['Dorsey', 'Piano'],
                   ['Mark','MS']],
                  columns = ['Name','Course'])
                  
students_df_json = students_df.groupby(['Name']).apply(lambda x: x.drop(columns='Name').to_json(orient='records')).reset_index(name="students_json")          
courses_df_json = courses_df.groupby(['Name']).apply(lambda x: x.to_json(orient='records')).reset_index(name="courses_json")

I want to join the two dataframes and then concatenate the two json columns students_json and courses_json

pd.merge(students_df_json , courses_df_json, on="Name")

My try

pd.merge(students_df_json, courses_df_json, on="Name").apply(        
     lambda x: x["students_json"]   x["courses_json"]                 
 )

But this results in a key error

KeyError: 'students_json'

Also I have another question what is the variable x referring to in the lambda ?

CodePudding user response：

For vectorized join dont use apply, only use with Series:

df = pd.merge(students_df_json, courses_df_json, on="Name", how='left')
s = df["students_json"]   df["courses_json"]

CodePudding user response：

Usually the lambda x refers to the function input variable x. In your case when you apply the function on the dataframe it is by default on the axis=0 which is row wise values.

For your requirement I believe you need to apply through axis=1 for column wise values as mentioned below.

pd.merge(students_df_json, courses_df_json, on="Name").apply(        
     lambda x: x["students_json"]   x["courses_json"], axis=1
 )