I have a dataframe parents_df
with the following schema:
root
|-- parent: string (nullable = true)
|-- state: string (nullable = true)
|-- children: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- child: string (nullable = true)
| | |-- dob: string (nullable = true)
| | |-- pet: string (nullable = true)
| | |-- pet_demo: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- pet_name: string (nullable = true)
| | | | |-- pet_age: string (nullable = true)
| | | | |-- pet_age: string (nullable = true)
created by:
schema = StructType([
StructField("parent", StringType()),
StructField("state", StringType()),
StructField("children", ArrayType(
StructType([
StructField("child", StringType()),
StructField("dob", StringType()),
StructField("pet", StringType()),
StructField("pet_demo", ArrayType(
StructType([
StructField("pet_name", StringType()),
StructField("pet_age", StringType())])
))
])
))
])
parents_df = spark.createDataFrame(data=parents, schema=schema)
When I try to enter data, I get a syntax error:
parents = [
(
"John",
"NE",
[
{"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", [{"pet_name": "Lucky", "pet_age": "10"}]},
{"child": "Billy", "dob": "2012-09-07"}
]
),
(
"Jane",
"IA",
[
{"child": "Sally", "dob": "2008-08-19"},
{"child": "Tim", "dob": "2013-09-15"}
]
),
(
"Sue",
"IA",
[
{"child": "Cameron", "dob": "2009-11-21", "pet": "cat", [{"pet_name": "Lori", "pet_age": "5"}]}
]
),
]
What is the problem?
The error message says:
An error was encountered:
invalid syntax (<stdin>, line 6)
File "<stdin>", line 6
{"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", [{"pet_name": "Lucky", "pet_age": "10"}]},
^
SyntaxError: invalid syntax
CodePudding user response:
parents
is incorrectly defined, and the pet_demo
embedded key is missing.
parents = [
(
"John",
"NE",
[
{"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", 'pet_demo': [{"pet_name": "Lucky", "pet_age": "10"}]},
{"child": "Billy", "dob": "2012-09-07"}
]
),
(
"Jane",
"IA",
[
{"child": "Sally", "dob": "2008-08-19"},
{"child": "Tim", "dob": "2013-09-15"}
]
),
(
"Sue",
"IA",
[
{"child": "Cameron", "dob": "2009-11-21", "pet": "cat", 'pet_demo': [{"pet_name": "Lori", "pet_age": "5"}]}
]
),
]
CodePudding user response:
{
"child": "Jimmy",
"dob": "2010-10-12",
"pet": "dog",
[
{
"pet_name": "Lucky",
"pet_age": "10"
}
]
},
This looks like a dictionary. Items in a dictionary must be key/value pairs.
But after "pet": "dog"
you have a plain list, which is not a key/value pair.