Home > Blockchain >  Invalid Syntax for Nested DataFrame
Invalid Syntax for Nested DataFrame

Time:03-30

I have a dataframe parents_df with the following schema:

    root
 |-- parent: string (nullable = true)
 |-- state: string (nullable = true)
 |-- children: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- child: string (nullable = true)
 |    |    |-- dob: string (nullable = true)
 |    |    |-- pet: string (nullable = true)
 |    |    |-- pet_demo: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- pet_name: string (nullable = true)
 |    |    |    |    |-- pet_age: string (nullable = true)
         |    |    |    |    |-- pet_age: string (nullable = true)

created by:

schema = StructType([
    StructField("parent", StringType()),
    StructField("state", StringType()),
    StructField("children", ArrayType(
        StructType([
            StructField("child", StringType()),
            StructField("dob", StringType()),
            StructField("pet", StringType()),
            StructField("pet_demo", ArrayType(
        StructType([
            StructField("pet_name", StringType()),
            StructField("pet_age", StringType())])
    ))
        ])
    ))
])

parents_df = spark.createDataFrame(data=parents, schema=schema)

When I try to enter data, I get a syntax error:

parents = [
    (
        "John",
        "NE",
        [
            {"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", [{"pet_name": "Lucky", "pet_age": "10"}]},
            {"child": "Billy", "dob": "2012-09-07"}
        ]
    ),
    (
        "Jane",
        "IA",
        [
            {"child": "Sally", "dob": "2008-08-19"},
            {"child": "Tim", "dob": "2013-09-15"}
        ]
    ),
    (
        "Sue",
        "IA",
        [
            {"child": "Cameron", "dob": "2009-11-21", "pet": "cat", [{"pet_name": "Lori", "pet_age": "5"}]}
        ]
    ),
]

What is the problem?

The error message says:

An error was encountered:
invalid syntax (<stdin>, line 6)
  File "<stdin>", line 6
    {"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", [{"pet_name": "Lucky", "pet_age": "10"}]},
                                                                                                  ^
SyntaxError: invalid syntax

CodePudding user response:

parents is incorrectly defined, and the pet_demo embedded key is missing.

parents = [
    (
        "John",
        "NE",
        [
            {"child": "Jimmy", "dob": "2010-10-12", "pet": "dog", 'pet_demo': [{"pet_name": "Lucky", "pet_age": "10"}]},
            {"child": "Billy", "dob": "2012-09-07"}
        ]
    ),
    (
        "Jane",
        "IA",
        [
            {"child": "Sally", "dob": "2008-08-19"},
            {"child": "Tim", "dob": "2013-09-15"}
        ]
    ),
    (
        "Sue",
        "IA",
        [
            {"child": "Cameron", "dob": "2009-11-21", "pet": "cat", 'pet_demo': [{"pet_name": "Lori", "pet_age": "5"}]}
        ]
    ),
]

CodePudding user response:

{
    "child": "Jimmy",
    "dob": "2010-10-12",
    "pet": "dog",
    [
        {
            "pet_name": "Lucky",
            "pet_age": "10"
        }
    ]
},

This looks like a dictionary. Items in a dictionary must be key/value pairs.

But after "pet": "dog" you have a plain list, which is not a key/value pair.

  • Related