{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": true,
"field": "c1"
},
{
"type": "string",
"optional": true,
"field": "c2"
},
{
"type": "int64",
"optional": false,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "create_ts"
},
{
"type": "int64",
"optional": false,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "update_ts"
}
],
"optional": false,
"name": "foobar"
},
"payload": {
"c1": 67,
"c2": "foo",
"create_ts": 1663920002000,
"update_ts": 1663920002000
}
}
I have my json string in this format and I don't want the whole data into data into table , wanted the table in this format.
| c1 | c2 | create_ts | update_ts |
------ ------ ------------------ ---------------------
| 1 v| foo | 2022-09-21 10:47:54 | 2022-09-21 10:47:54 |
| 28 | foo | 2022-09-21 13:16:45 | 2022-09-21 13:16:45 |
| 29 | foo | 2022-09-21 14:19:10 | 2022-09-21 14:19:10 |
| 30 | foo | 2022-09-21 14:19:20 | 2022-09-21 14:19:20 |
| 31 | foo | 2022-09-21 14:29:19 | 2022-09-21 14:29:19 |
CodePudding user response:
Skip other (nested) attributes by specifying the only one you want to see in the resulting output:
(
spark
.read
.option("multiline","true")
.json("/path/json-path")
.select("payload.*")
.show()
)