Home > Back-end >  Spark- scan data frame base on value
Spark- scan data frame base on value

Time:04-17

I'm trying to find a column (I do know the name of the column) base on a value. For example in this dataframe below, I'd like to know which row that has a column contains yellow for Category = A . The thing is I don't know the column name (colour) in advance so I couldn't do select * where Category = 'A' and colour = 'yellow' How can I scan the columns and achieve this? Many thanks for your help.

 -------- ----------- ------------- 
|Category|colour     |.      name. |
 -------- ----------- ------------- 
|A.      |      blue.|         Elmo|
|A       |    yellow |         Alex|
|B       |      desc |         Erin|
 -------- ----------- ------------- 

CodePudding user response:

You can loop that check through the list of column names. You also can wrap this loop in a function for the readable purpose. Please note that this check per column would happen in sequence.

from pyspark.sql import functions as F

cols = df.columns

for c in cols:
    cnt = df.where((F.col('Category') == 'A') & (F.col(c) == 'yellow')).count()
    if cnt > 0:
        print(c)
  • Related