Home > Enterprise >  What do lit(0) and lit(1) do in Scala/Spark aggregate functions?
What do lit(0) and lit(1) do in Scala/Spark aggregate functions?

Time:09-17

I have this piece of code:

val df = resultsDf
  .withColumn("data_exploded", explode(col("data_item")))
  .groupBy("data_id","data_count")
  .agg(
    count(lit(1)).as("aggDataCount"),
    sum(when(col("data_exploded.amount")==="A",col("data_exploded.positive_amount")).otherwise(lit(0))).as("aggAmount")
  )

Is lit(0) referring to an index at position 0, or to the literal value of number 0? The definition I see at https://mungingdata.com/apache-spark/spark-sql-functions/#:~:text=The lit() function creates,spanish_hi column to the DataFrame.&text=The lit() function is especially useful when making boolean comparisons says that "The lit() function creates a Column object out of a literal value." That definition makes me think that it is not referring to an index position but to a literal value such as a number or String. However, the usage in count(lit(1)).as("aggDataCount") looks like referring to an index position of a column to me. Thank you.

CodePudding user response:

lit(1) means the literal value 1

count(lit(1)).as("aggDataCount") is a way to count the number of rows (each row having a column with a value of 1 and summing this column)

CodePudding user response:

In spark lit represents literal value.

lit(0)--> put 0 as a value in column , lit(1) --> means put 1 as a value in column.

In the code you shown above they are applying an aggregation over 2 columns and keeping a count of how many rows from count(lit(1)), over a condition.

The next lit(0) is in otherwise clause which is like an else condition. lit(0) will add 0 as a literal value in the column.

  • Related