I have two dataframes, DF1 and DF2 and they have same column names
Lets say the DF1 is of the following format,
Item Id | item | model | price |
---|---|---|---|
1 | item 1 | 22 | 100 |
2 | item 2 | 33 | 300 |
3 | item 3 | 44 | 400 |
4 | item 4 | 55 | 500 |
DF2 contains following format
Item Id | item | model | price |
---|---|---|---|
1 | item 1 | 222 | 1000 |
1 | item 1 | 2222 | 10000 |
2 | item 2 | 333 | 3000 |
3 | item 3 | 444 | 4000 |
4 | item 4 | 555 | 5000 |
I need to combine the two dataframes such that the result should be like:
Item Id | item | model | price |
---|---|---|---|
1 | item 1 | 22 | 100 |
1 | item 1 | 222 | 1000 |
1 | item 1 | 2222 | 10000 |
2 | item 2 | 33 | 300 |
2 | item 2 | 333 | 3000 |
3 | item 3 | 44 | 400 |
3 | item 3 | 444 | 4000 |
4 | item 4 | 55 | 500 |
4 | item 4 | 555 | 5000 |
I need to use only pyspark not pandas. Thanks for help.
CodePudding user response:
You may use a union here
df1.union(df2)
or more specific
df1.select("Item Id","item","model","price").union(df2.select("Item Id","item","model","price"))
optionally you may order your results
df1.union(df2).orderBy("Item Id","item","model","price")
Let me know if this works for you.