I have two tables first one contains the car model, and second one contains specific mode of that model
for example :
This is a part of first table :
Car Brand:
Abarth
Alfa Romeo
Aston Martin
Audi
Bentley
BMW
and second table :
Make/Model:
Chevrolet Pickup (Full Size)
Ford Pickup (Full Size)
Toyota Camry
Nissan Altima
Chevrolet Impala
Honda Accord
GMC Pickup (Full Size)
and I need to join these two tables, I need to use regex on it so i can take the first part of the second table and join the two tables
for example :
> Honda Accord join with Honda in first table
I did something like this :
Dataset<Row> updatedCars = carsTable.join(carsTheftsTable, expr("Car Brand rlike Make/Model")).cache();
but it's not working in java spark expr
not found
any help ?
CodePudding user response:
You could try to cross join and check each model with each car brand if they fit.
For example in pyspark
:
import pyspark.sql.functions as F # noqa
from pyspark.sql.types import StructType, StructField, StringType
df1 = spark.createDataFrame(data=[
["Honda"],
["Alfa Romeo"],
["Aston Martin"],
["Toyota"],
["Nissan"],
["BMW"]
], schema=StructType([StructField("car_brand", StringType())])).cache()
df2 = spark.createDataFrame(data=[
["Chevrolet Pickup (Full Size)"],
["Ford Pickup (Full Size)"],
["Toyota Camry"],
["Nissan Altima"],
["Chevrolet Impala"],
["Honda Accord"],
["GMC Pickup (Full Size)"]
], schema=StructType([StructField("car_model", StringType())])).cache()
df1.crossJoin(df2).filter(F.col("car_model").contains(F.col("car_brand"))).show(100, False)
--------- -------------
|car_brand|car_model |
--------- -------------
|Honda |Honda Accord |
|Toyota |Toyota Camry |
|Nissan |Nissan Altima|
--------- -------------