I am attempting to join two json columns using Pandas.
import pandas as pd
students_df = pd.DataFrame([['Jay',16,'Soccer'],['Jay',26,'Soccer'],
['Jack',19, 'FootBall'],
['Dorsey',19, 'Dining'],
['Mark',18, 'Swimming']],
columns = ['Name','Age', 'Sport'])
courses_df = pd.DataFrame([['Jay','MS'],
['Jay','Music'],
['Dorsey','Music'],
['Dorsey', 'Piano'],
['Mark','MS']],
columns = ['Name','Course'])
students_df_json = students_df.groupby(['Name']).apply(lambda x: x.drop(columns='Name').to_json(orient='records')).reset_index(name="students_json")
courses_df_json = courses_df.groupby(['Name']).apply(lambda x: x.to_json(orient='records')).reset_index(name="courses_json")
I want to join the two dataframes and then concatenate the two json columns students_json and courses_json
pd.merge(students_df_json , courses_df_json, on="Name")
My try
pd.merge(students_df_json, courses_df_json, on="Name").apply(
lambda x: x["students_json"] x["courses_json"]
)
But this results in a key error
KeyError: 'students_json'
Also I have another question what is the variable x referring to in the lambda ?
CodePudding user response:
For vectorized join dont use apply
, only use
with Series
:
df = pd.merge(students_df_json, courses_df_json, on="Name", how='left')
s = df["students_json"] df["courses_json"]
CodePudding user response:
Usually the lambda x
refers to the function input variable x
. In your case when you apply the function on the dataframe it is by default on the axis=0
which is row wise values.
For your requirement I believe you need to apply through axis=1
for column wise values as mentioned below.
pd.merge(students_df_json, courses_df_json, on="Name").apply(
lambda x: x["students_json"] x["courses_json"], axis=1
)