How to merge to local datasets in spark scala 2.11.11-CodePudding

I have 2 local files that I am reading into spark scala 2.11.11. The first file has 5 columns and the second file has 3 columns. There is one id column that is in both files. I have tried using the merge function, but, that is not working.

Can someone help me with merging these 2 files and display the top 100 records?

df1
pId
routeId
from
to
date

df2
pId
firstName
lastName

Desired output:

pId, firstName, lastName

CodePudding user response：

I'm not sure which merge function you were trying to use, but you can just join these two dataframes

df
.join(df2, Seq("pid"))
.select("pid", "firstName", "lastName")

Look at join signature and its overloadings in the scaladoc