Home > front end >  Converting SCALA === (triple equal) to Python for SPARK column
Converting SCALA === (triple equal) to Python for SPARK column


I have a following code in Scala for Python conversion

import org.apache.spark.sql.functions.{col, lit}
import org.apache.spark.sql.{Column, DataFrame, Dataset}

object SearchTermReader {

  def read(
    searchTermsInputTable: DataFrame,
    brand: String,
    posa: String,
    startDate: String,
    endDate: String
  ): Dataset[SearchTerm] = {

    import searchTermsInputTable.sparkSession.implicits._

    val conditionsNoEndDate = getConditions(brand, posa, startDate)
    val searchTermsNoEndDate = searchTermsInputTable


    val columnNames = SparkExtensions.getColumns[SearchTerm]

      .select(columnNames: _*)

  def getConditions(
    brand: String,
    posa: String,
    startDate: String
  ): Column = {

    val filterByBrandCondition: Column = {
      if (brand.equals("")) {
      } else {
        col("brand") === brand
    val filterByPosaCondition: Column = {
      if (posa.equals("")) {
      } else {

    filterByBrandCondition &&
      filterByPosaCondition &&
      col("search_engine") === "GOOGLE" &&
      col("impressions") > 0 &&

  def getAccountPattern(countryCodes: String): String = {
    countryCodes.split(",").map(cc => s":G:$cc:").mkString("|")

Seems to be two issues here for straight conversion.

  1. Dataset is used which is not supported by Pyspark
  2. === is used for Column which is also not supported

How I can overcome this and convert it to Python ??

CodePudding user response:

If you are referring to column of the dataframe then you can use it like below.

df.filter((col("brand") == "BRAND") & (...))

CodePudding user response:

Pyspark doesn't support using === just as Scala.

In Scala, the == is using the equals methods which checks if the two references point to the same object. The definition of === depends on the context/object. For Spark , === is using the equalTo method.

In Pyspark, you make use of = or ==. Having said that, in Pyspark you do following implementations to get same result per your Scala code -

df.filter("Brand = 'BRAND'")


df.filter(df.Brand == 'BRAND')


df.filter(df["Brand"] == 'BRAND')


from pyspark.sql.functions import *
df.filter(col("Brand") == 'BRAND')
  • Related