Home > Software design >  Replace Spark array values with values from python dictionary
Replace Spark array values with values from python dictionary

Time:08-05

I have a Spark dataframe column having array values:

| data | arraydata |
| ---- | ---------
| text | [0,1,2,3] |
| page | [0,1,4,3] |

I want to replace [0,1,2,3,4] with [negative,positive,name,sequel,odd]

CodePudding user response:

mapping = {0: "negative", 1: "positive", 2: "name", 3: "sequel", 4: "odd"}
mapping_column = map_from_entries(array(*[struct(lit(k), lit(v)) for k, v in mapping.items()]))

df = df.withColumn("mapping", mapping_column) \
       .withColumn("finalArray", expr(""" transform(flagArray, x -> element_at(mapping, x))""")) \
       .drop("mapping")

CodePudding user response:

I like the idea of creating a map. I wanted to create a more streamlined code (requires Spark 3.1)

Input:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('text', [0,1,2,3]),
     ('page', [0,1,4,3])],
    ['data', 'arraydata'])

mapping = {0: "negative", 1: "positive", 2: "name", 3: "sequel", 4: "odd"}

Script:

map_col = F.create_map([F.lit(x) for i in mapping.items() for x in i])
df = df.withColumn('arraydata', F.transform('arraydata', lambda x: map_col[x]))

df.show(truncate=0)
#  ---- ---------------------------------- 
# |data|arraydata                         |
#  ---- ---------------------------------- 
# |text|[negative, positive, name, sequel]|
# |page|[negative, positive, odd, sequel] |
#  ---- ---------------------------------- 
  • Related