I have this piece of code:
val df = resultsDf
.withColumn("data_exploded", explode(col("data_item")))
.groupBy("data_id","data_count")
.agg(
count(lit(1)).as("aggDataCount"),
sum(when(col("data_exploded.amount")==="A",col("data_exploded.positive_amount")).otherwise(lit(0))).as("aggAmount")
)
Is lit(0)
referring to an index at position 0, or to the literal value of number 0
? The definition I see at https://mungingdata.com/apache-spark/spark-sql-functions/#:~:text=The lit() function creates,spanish_hi column to the DataFrame.&text=The lit() function is especially useful when making boolean comparisons says that "The lit()
function creates a Column object out of a literal value." That definition makes me think that it is not referring to an index position but to a literal value such as a number or String. However, the usage in count(lit(1)).as("aggDataCount")
looks like referring to an index position of a column to me. Thank you.
CodePudding user response:
lit(1)
means the literal value 1
count(lit(1)).as("aggDataCount")
is a way to count the number of rows (each row having a column with a value of 1
and summing this column)
CodePudding user response:
In spark lit represents literal value.
lit(0)--> put 0 as a value in column , lit(1) --> means put 1 as a value in column.
In the code you shown above they are applying an aggregation over 2 columns and keeping a count of how many rows from count(lit(1)), over a condition.
The next lit(0) is in otherwise clause which is like an else condition. lit(0) will add 0 as a literal value in the column.