I am trying evaluate each field in the if statement below.
However, I am running into the following error: Method col([class java.util.ArrayList]) does not exist.
What I am trying to achieve: I am trying to evaluate two fields in my dataframe - Name and Surname, in a Python function. In these fields, I have NULL values. For each field, I would like to identify if NULL values exist.
I am loading various datasets with fields that should be evaluated from each set. I would like to pass these fields into the function to check if NULL values exist.
def identifyNull(Field):
Field = ['Name', 'Surname'] - this is an example of what I would like to pass to my function.
for x in Field:
if df.select().filter(col(Field).isNull()).count() > 0:
print(Field)
else:
print('False')
df = the dataframe name for the data I am reading.
df structure:
Name | Surname |
---|---|
John | Doe |
NULL | James |
Lisa | NULL |
Please note: I am completely new to Python and Spark.
CodePudding user response:
You're calling col(Field)
with Field is a list. Since you're looping through fields, try with col(x)
instead.
So it'd be something like this:
for x in Field:
if df.where(F.col('Name').isNull()).count() > 0:
print(x)
else:
print('False')
CodePudding user response:
Assuming
data = [["John", "Doe"],
[None, "James"],
["Lisa", None]]
Field = ["Name", "Surname"]
df = spark.createDataFrame(data, Field)
df.show()
returns:
---- -------
|Name|Surname|
---- -------
|John| Doe|
|null| James|
|Lisa| null|
---- -------
Then
for x in Field:
if df.select(x).where(x " is null").count()>0:
print(x)
else:
print(False)
returns:
Name
Surname