student marks are stored in hdfs://Hmaster/training/dump/stdmarks1.txt
Input format: sno, name, m1, m2, m3, branch create an rdd and display the student names of students belongs to branch: cse Display the names of students using println. format of output: xxxx yyyy
And I have a sample text file
1,RAMESH,70,52,60,CSE
2,SOMESH,80,69,88,ECE
3,VANITA,90,73,92,CSE
4,KIRAN,74,96,68,IT
The output should be only student's name:
RAMESH
VANITA
Already uploaded the text file in hdfs as given but not able to do further steps
CodePudding user response:
This is an example:
spark
.read
.option("header", "true")
.csv(hdfsFilePath)
.where(col("m3") === "CSE")
.select("name")
.distinct()
.show()
I recommend you to read the documentation.