Home > Software engineering >  How to convert Spark map keys to individual columns
How to convert Spark map keys to individual columns

Time:12-06

I'm using spark 2.3 and scala 2.11.8.

I have a Dataframe like below,

----------------------------------------------------
| ID  | Name | Desc_map                            |
----------------------------------------------------
|  1  | abcd | "Company" -> "aa" , "Salary" -> "1" |
|  2  | efgh | "Company" -> "bb" , "Salary" -> "2" |
|  3  | ijkl | "Company" -> "cc" , "Salary" -> "3" |
|  4  | mnop | "Company" -> "dd" , "Salary" -> "4" |
----------------------------------------------------

Expected Dataframe,

---------------------------------
| ID  | Name | Company | Salary |                            |
---------------------------------
|  1  | abcd |   aa    |   1    |
|  2  | efgh |   bb    |   2    |
|  3  | ijkl |   cc    |   3    |
|  4  | mnop |   dd    |   4    |
---------------------------------

Any help is appreciated.

CodePudding user response:

To convert the keys of a Spark map to individual columns, you can use the pyspark.sql.functions.explode() function. This function takes a column that contains a map as an input, and returns a new row for each key-value pair in the map, with the key and value as separate columns.

Here is an example of how to use this function to convert the keys of a Spark map to individual columns:

from pyspark.sql import functions as F

# Define a DataFrame with a column that contains a map
df = spark.createDataFrame([
    (1, {"a": 1, "b": 2, "c": 3}),
    (2, {"d": 4, "e": 5, "f": 6}),
    (3, {"g": 7, "h": 8, "i": 9})
])

# Explode the keys of the map as separate columns
df = df.withColumn("key", F.explode(F.keys("map_col")))
df = df.withColumn("value", F.explode(F.values("map_col")))

# Print the resulting DataFrame
df.show()

CodePudding user response:

If data is your dataset that contains:

 --- ---- ---------------------------- 
|ID |Name|Map                         |
 --- ---- ---------------------------- 
|1  |abcd|{Company -> aa, Salary -> 1}|
|2  |efgh|{Company -> bb, Salary -> 2}|
|3  |ijkl|{Company -> cc, Salary -> 3}|
|4  |mnop|{Company -> aa, Salary -> 4}|
 --- ---- ---------------------------- 

You can get your desired output through:

data = data.selectExpr(
  "ID",
  "Name",
  "Map.Company",
  "Map.Salary"
)

Final output:

 --- ---- ------- ------ 
|ID |Name|Company|Salary|
 --- ---- ------- ------ 
|1  |abcd|aa     |1     |
|2  |efgh|bb     |2     |
|3  |ijkl|cc     |3     |
|4  |mnop|aa     |4     |
 --- ---- ------- ------ 

Good luck!

  • Related