I'm experiencing an issue with the following function. I'm trying to translate this to a SQL statement so I can have a better idea of exactly what's happening, so I can more effectively work on my actual issue.
I know that this contains a join between valid_data to ri_data, a filter, and a select statement. I'm primarily having issue understanding how to write the join piece.
result = (
valid_data.join(
ri_data,
F.col(table_name "." column_name) == ri_data.ri_column,
"left_outer",
)
.filter(f"ri_column IS NULL")
.selectExpr(
"etl_row_id AS row_id", f"{table_name}.{column_name} AS error_value"
)
.distinct()
Any help is appreciated.
CodePudding user response:
You have some substitutions to do, like the column_name for the join keys, etc. But the general structure looks like this in SQL:
SELECT DISTINCT A.*, B.*,
etl_row_id AS row_id,
A.column_name AS error_value
FROM valid_data A
LEFT OUTER JOIN ri_data B
ON A.column_name = B.ri_column
WHERE B.ri_column IS NULL