PySpark : Merge two dataframes-CodePudding

I have two dataframes, DF1 and DF2 and they have same column names

Lets say the DF1 is of the following format,

Item Id	item	model	price
1	item 1	22	100
2	item 2	33	300
3	item 3	44	400
4	item 4	55	500

DF2 contains following format

Item Id	item	model	price
1	item 1	222	1000
1	item 1	2222	10000
2	item 2	333	3000
3	item 3	444	4000
4	item 4	555	5000

I need to combine the two dataframes such that the result should be like:

Item Id	item	model	price
1	item 1	22	100
1	item 1	222	1000
1	item 1	2222	10000
2	item 2	33	300
2	item 2	333	3000
3	item 3	44	400
3	item 3	444	4000
4	item 4	55	500
4	item 4	555	5000

I need to use only pyspark not pandas. Thanks for help.

CodePudding user response：

You may use a union here

df1.union(df2)

or more specific

df1.select("Item Id","item","model","price").union(df2.select("Item Id","item","model","price"))

optionally you may order your results

df1.union(df2).orderBy("Item Id","item","model","price")

Let me know if this works for you.