Home > Enterprise >  How to check if a value in a column is found in a list in a column, with Spark SQL?
How to check if a value in a column is found in a list in a column, with Spark SQL?

Time:11-03

I have a delta table A as shown below.

point cluster points_in_cluster
37 1 [37,32]
45 2 [45,67,84]
67 2 [45,67,84]
84 2 [45,67,84]
32 1 [37,32]

Also I have a table B as shown below.

id point
101 37
102 67
103 84

I want a query like the following. Here in obviously doesn't work for a list. So, what would be the right syntax?

select b.id, a.point
from A a, B b
where b.point in a.points_in_cluster

As a result I should have a table like the following

id point
101 37
101 32
102 45
102 67
102 84
103 45
103 67
103 84

CodePudding user response:

Based on your data sample, I'd do an equi-join on point column and then an explode on points_in_cluster :

from pyspark.sql import functions as F

# assuming A is df_A and B is df_B

df_A.join(
    df_B,
    on="point"
).select(
    "id",
    F.explode("points_in_cluster").alias("point")
)

Otherwise, you use array_contains:

select b.id, a.point
from A a, B b
where array_contains(a.points_in_cluster, b.point)
  • Related