Home > Back-end >  Evaluate each field in Python function
Evaluate each field in Python function

Time:04-09

I am trying evaluate each field in the if statement below.

However, I am running into the following error: Method col([class java.util.ArrayList]) does not exist.

What I am trying to achieve: I am trying to evaluate two fields in my dataframe - Name and Surname, in a Python function. In these fields, I have NULL values. For each field, I would like to identify if NULL values exist.

I am loading various datasets with fields that should be evaluated from each set. I would like to pass these fields into the function to check if NULL values exist.

def identifyNull(Field):

Field = ['Name', 'Surname'] - this is an example of what I would like to pass to my function. 

for x in Field:
  if df.select().filter(col(Field).isNull()).count() > 0:
    print(Field)
  else:
    print('False')

df = the dataframe name for the data I am reading.

df structure:

Name Surname
John Doe
NULL James
Lisa NULL

Please note: I am completely new to Python and Spark.

CodePudding user response:

You're calling col(Field) with Field is a list. Since you're looping through fields, try with col(x) instead.

So it'd be something like this:

for x in Field:
    if df.where(F.col('Name').isNull()).count() > 0:
        print(x)
    else:
        print('False')

CodePudding user response:

Assuming

data = [["John", "Doe"], 
        [None, "James"],
        ["Lisa", None]]
Field = ["Name", "Surname"]
df = spark.createDataFrame(data, Field)
df.show()

returns:

 ---- ------- 
|Name|Surname|
 ---- ------- 
|John|    Doe|
|null|  James|
|Lisa|   null|
 ---- ------- 

Then

for x in Field:
    if df.select(x).where(x " is null").count()>0:
        print(x)
    else:
        print(False)

returns:

Name
Surname
  • Related