Home > Enterprise >  Extract key value from dataframe in PySpark
Extract key value from dataframe in PySpark

Time:10-18

I have the below dataframe which I have read from a JSON file.

1 2 3 4
{"todo":"[wakeup, shower]"} {"todo":"[brush, eat]"} {"todo":"[read, write]"} {"todo":"[sleep, snooze]"}

I need my output to be as below Key and Value. How do I do this? Do I need to create a schema?

ID todo
1 wakeup, shower
2 brush, eat
3 read, write
4 sleep, snooze

CodePudding user response:

The key-value which you refer to is a struct. "keys" are struct field names, while "values" are field values.

What you want to do is called unpivoting. One of the ways to do it in PySpark is using stack. The following is a dynamic approach, where you don't need to provide existent column names.

Input dataframe:

df = spark.createDataFrame(
    [((['wakeup', 'shower'],),(['brush', 'eat'],),(['read', 'write'],),(['sleep', 'snooze'],))],
    '`1` struct<todo:array<string>>, `2` struct<todo:array<string>>, `3` struct<todo:array<string>>, `4` struct<todo:array<string>>')

Script:

to_melt = [f"\'{c}\', `{c}`.todo" for c in df.columns]
df = df.selectExpr(f"stack({len(to_melt)}, {','.join(to_melt)}) (ID, todo)")

df.show()
#  --- ---------------- 
# | ID|            todo|
#  --- ---------------- 
# |  1|[wakeup, shower]|
# |  2|    [brush, eat]|
# |  3|   [read, write]|
# |  4| [sleep, snooze]|
#  --- ---------------- 

CodePudding user response:

Use from_json to convert string to array. Explode to cascade each unique element to row.

data

df = spark.createDataFrame(
    [(('{"todo":"[wakeup, shower]"}'),('{"todo":"[brush, eat]"}'),('{"todo":"[read, write]"}'),('{"todo":"[sleep, snooze]"}'))],
    ('value1','values2','value3','value4'))

code

new = (df.withColumn('todo', explode(flatten(array(*[map_values(from_json(x, "MAP<STRING,STRING>")) for x in df.columns])))) #From string to array to indivicual row
   .withColumn('todo', translate('todo',"[]",'')#Remove corner brackets
              ) ).show(truncate=False)

outcome

 --------------------------- ----------------------- ------------------------ -------------------------- -------------- 
|value1                     |values2                |value3                  |value4                    |todo          |
 --------------------------- ----------------------- ------------------------ -------------------------- -------------- 
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|wakeup, shower|
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|brush, eat    |
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|read, write   |
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|sleep, snooze |
 --------------------------- ----------------------- ------------------------ -------------------------- -------------- 
  • Related