I have the below dataframe which I have read from a JSON file.
1 | 2 | 3 | 4 |
---|---|---|---|
{"todo":"[wakeup, shower]"} | {"todo":"[brush, eat]"} | {"todo":"[read, write]"} | {"todo":"[sleep, snooze]"} |
I need my output to be as below Key and Value. How do I do this? Do I need to create a schema?
ID | todo |
---|---|
1 | wakeup, shower |
2 | brush, eat |
3 | read, write |
4 | sleep, snooze |
CodePudding user response:
The key-value which you refer to is a struct. "keys" are struct field names, while "values" are field values.
What you want to do is called unpivoting. One of the ways to do it in PySpark is using stack
. The following is a dynamic approach, where you don't need to provide existent column names.
Input dataframe:
df = spark.createDataFrame(
[((['wakeup', 'shower'],),(['brush', 'eat'],),(['read', 'write'],),(['sleep', 'snooze'],))],
'`1` struct<todo:array<string>>, `2` struct<todo:array<string>>, `3` struct<todo:array<string>>, `4` struct<todo:array<string>>')
Script:
to_melt = [f"\'{c}\', `{c}`.todo" for c in df.columns]
df = df.selectExpr(f"stack({len(to_melt)}, {','.join(to_melt)}) (ID, todo)")
df.show()
# --- ----------------
# | ID| todo|
# --- ----------------
# | 1|[wakeup, shower]|
# | 2| [brush, eat]|
# | 3| [read, write]|
# | 4| [sleep, snooze]|
# --- ----------------
CodePudding user response:
Use from_json to convert string to array. Explode to cascade each unique element to row.
data
df = spark.createDataFrame(
[(('{"todo":"[wakeup, shower]"}'),('{"todo":"[brush, eat]"}'),('{"todo":"[read, write]"}'),('{"todo":"[sleep, snooze]"}'))],
('value1','values2','value3','value4'))
code
new = (df.withColumn('todo', explode(flatten(array(*[map_values(from_json(x, "MAP<STRING,STRING>")) for x in df.columns])))) #From string to array to indivicual row
.withColumn('todo', translate('todo',"[]",'')#Remove corner brackets
) ).show(truncate=False)
outcome
--------------------------- ----------------------- ------------------------ -------------------------- --------------
|value1 |values2 |value3 |value4 |todo |
--------------------------- ----------------------- ------------------------ -------------------------- --------------
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|wakeup, shower|
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|brush, eat |
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|read, write |
|{"todo":"[wakeup, shower]"}|{"todo":"[brush, eat]"}|{"todo":"[read, write]"}|{"todo":"[sleep, snooze]"}|sleep, snooze |
--------------------------- ----------------------- ------------------------ -------------------------- --------------