Home > Mobile >  How to check array contains string by using pyspark with this structure
How to check array contains string by using pyspark with this structure

Time:12-20

The curly brackets are odd. Tried with different approaches, but none of them works

# root
#  |-- L: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- S: string (nullable = true)

#  ------------------ 
# |                 L|
#  ------------------ 
# |[{string1}]|
# |[{string2}]|
#  ------------------ 

CodePudding user response:

Use filter() to get array elements matching given criteria.

Since, the elements of array are of type struct, use getField() to read the string type field, and then use contains() to check if the string contains the search term.

Following sample example searches term "hello":

df = spark.createDataFrame(data=[[[("hello world",)]],[[("foo bar",)]]], schema="L array<struct<S string>>")

string_to_search = "hello"

import pyspark.sql.functions as F

df = df.withColumn("arr_contains_str", \
                   F.size( \
                          F.filter("L", \
                                   lambda e: e.getField("S") \
                                              .contains(string_to_search))) > 0)

df.show(truncate=False)

Output:

 --------------- ---------------- 
|L              |arr_contains_str|
 --------------- ---------------- 
|[{hello world}]|true            |
|[{foo bar}]    |false           |
 --------------- ---------------- 
  • Related