Now a DF is like this:
Case class Person (id: Long, name: String, age: an Integer, job: String, rn: Long)
Val df=spark createDataFrame (List (Person (1, "Jason", 34, null, 1),
Person (1, "Jason1", null, "Dev", 2),
The Person (1, null, 28, "DBA", 3),
Person (2, "Tom", 20, null, 1),
Person (2, "Tom1", null, "Cooker", 2)));
The output of the now is this:
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
| | id name | age | job | rn |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
34 | | 1 | | Jason null | | 1
| 1 | Jason1 | null | Dev | 2 |
| 1 | null 28 DBA | | | 3 |
20 | | 2 | Tom | null | | 1
2 | 2 | Tom1 | null | Cooker | |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
Requirements are grouped on ID to heavy, according to the RN as sorting, merging data, the next field is NULL, the keep on the value of a field
Looking forward to the output is:
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
| | id name | age | job | rn |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
DBA | 1 | Jason1 28 | | | 3 |
| 2 | Tom1 | | 20 Cooker | 3 |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
Look how the DF how convert? Thank you very much!
CodePudding user response:
Use the DatasetYou can also use Windws Function. The following is to use the Spark Dataset GroupByKey + reduceGroups
Scala> Spark. Version
Res18: String=2.4.3
Scala> Val ds=Seq (Person (1, "Jason", 34, null, 1), the Person (1, "Jason1", null, "Dev", 2), the Person (1, null, 28, "DBA", 3), the Person (2, "Tom", 20, null, 1), the Person (2, "Tom1", null, "Cooker", 2)). The toDS
Ds: org. Apache. Spark. SQL. The Dataset [Person]=[id: bigint, name: string... 3 more fields]
Scala> Ds. The show (false)
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
| | id name | age | job | rn |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
34 | | 1 | | Jason null | | 1
| 1 | Jason1 | null | Dev | 2 |
| 1 | null 28 DBA | | | 3 |
20 | | 2 | Tom | null | | 1
2 | 2 | Tom1 | null | Cooker | |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
Scala> Ds. GroupByKey (p=& gt; P.i d) reduceGroups ((p1, p2)=& gt; If (p1) rn & lt;=p2. Rn) Person (id=p2. Id, name=the if (p2) name==null) p1. Name the else p2. The name, age=the if (p2) age==null) p1. Age else p2, age, job=the if (p2) job==null) p1. Job else p2. The job, p2. Rn) else Person (id=p1. Id, name=the if (p1) name==null) p2. Name the else p1. The name, age=the if (p1) age==null) p2. Age else p1, age, job=the if (p1) job==null) p2. Job else p1. The job, p1, rn)). The map (_) _2). The show (false)
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
| | id name | age | job | rn |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
DBA | 1 | Jason1 28 | | | 3 |
| 2 | Tom1 | | 20 Cooker | 2 |
+ + -- -- -- -- -- - + - + -- -- -- -- -- - + - +
CodePudding user response:
Much!!!! Great god!