Home > database >  Transform spark data frame
Transform spark data frame

Time:12-16

I have a data frame in spark with the following format.

 ---------- ---------                                                               
|Column 1  |  Values |
 ---------- ---------: 
|    A     | value1  |
|    B     | value2  |
|    C     | value2  |
|    A     | value1  |
|    B     | value3  |
|    C     | value1  |
|    A     | value1  |
|    B     | value1  |
|    C     | value2  |
 ---------- --------- 

I would transform it to the following by counting the number of occurs for each value:

 ---------- --------- ---------- ---------                                                               
|Column 1  |  value1 | value2   |  value2 |
 ---------- --------- ---------- --------- 
|    A     |      3  |    0     |   0     |
|    B     |      1  |    1     |   1     |
|    C     |      1  |    2     |   0     |
 ---------- --------- ---------- --------- 

CodePudding user response:

You can use pivot method as follows:

df = spark.createDataFrame([("a", "value1"), ("b", "value2"), ("c", "value2"), ("a", "value1"), ("b", "value3"), ("c", "value1"), ("a", "value1"),("b", "value1"),("c", "value2")],['col1', 'col2'])
df.show()

pivotDF = df.groupBy("col1").pivot("col2").count().na.fill(0)
pivotDF.show()

Here is the output I get for the code above with spark 2.3:

 ---- ------ ------ ------ 
|col1|value1|value2|value3|
 ---- ------ ------ ------ 
|   c|     1|     2|     0|
|   b|     1|     1|     1|
|   a|     3|     0|     0|
 ---- ------ ------ ------ 
  • Related