Home > front end >  PySpark : Merge two dataframes
PySpark : Merge two dataframes

Time:09-24

I have two dataframes, DF1 and DF2 and they have same column names

Lets say the DF1 is of the following format,

Item Id item model price
1 item 1 22 100
2 item 2 33 300
3 item 3 44 400
4 item 4 55 500

DF2 contains following format

Item Id item model price
1 item 1 222 1000
1 item 1 2222 10000
2 item 2 333 3000
3 item 3 444 4000
4 item 4 555 5000

I need to combine the two dataframes such that the result should be like:

Item Id item model price
1 item 1 22 100
1 item 1 222 1000
1 item 1 2222 10000
2 item 2 33 300
2 item 2 333 3000
3 item 3 44 400
3 item 3 444 4000
4 item 4 55 500
4 item 4 555 5000

I need to use only pyspark not pandas. Thanks for help.

CodePudding user response:

You may use a union here

df1.union(df2)

or more specific

df1.select("Item Id","item","model","price").union(df2.select("Item Id","item","model","price"))

optionally you may order your results

df1.union(df2).orderBy("Item Id","item","model","price")

Let me know if this works for you.

  • Related