Home > other >  Merge two different dataframes in pyspark
Merge two different dataframes in pyspark

Time:07-22

I have two different dataframes, one is date combinations, and one is city pairs:

df_date_combinations:

 ------------------- ------------------- 
|            fs_date|            ss_date|
 ------------------- ------------------- 
|2022-06-01T00:00:00|2022-06-02T00:00:00|
|2022-06-01T00:00:00|2022-06-03T00:00:00|
|2022-06-01T00:00:00|2022-06-04T00:00:00|
 ------------------- ------------------- 

city pairs:

 --------- -------------- --------- -------------- 
|fs_origin|fs_destination|ss_origin|ss_destination|
 --------- -------------- --------- -------------- 
|      TLV|           NYC|      NYC|           TLV|
|      TLV|           ROM|      ROM|           TLV|
|      TLV|           BER|      BER|           TLV|
 --------- -------------- --------- -------------- 

I want to combine them so I will have the following dataframe:

 ---------- ---------- --------- -------------- --------- -------------- 
|   fs_date|   ss_date|fs_origin|fs_destination|ss_origin|ss_destination|
 ---------- ---------- --------- -------------- --------- -------------- 
|2022-06-01|2022-06-02|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-03|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-04|      TLV|           NYC|      NYC|           TLV|
|2022-06-01|2022-06-02|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-03|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-04|      TLV|           ROM|      ROM|           TLV|
|2022-06-01|2022-06-02|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-03|      TLV|           BER|      BER|           TLV|
|2022-06-01|2022-06-04|      TLV|           BER|      BER|           TLV|
 ---------- ---------- --------- -------------- --------- -------------- 

Thanks!

CodePudding user response:

sounds like a cross join.

df1.crossJoin(df2)

CodePudding user response:

Pandas actually has built-in methods to do this, we use concat to concatenate the dataframes. You can read how to do this here:

The part that is pertinent to you would be:

pd.concat([df_date_combinations, city_pairs], axis = 1)

Hope this helps!

  • Related