var data = Seq[(String, Int)]()
for (i <- 1 until 10000) {
val str = f"value: ${i}"
data = data : (str, i)
}
val df = spark.sparkContext.parallelize(data).toDF()
df.createOrReplaceTempView("v_logs")
val a = spark.sql(
f"""
SELECT * FROM v_logs limit 20 <---- query
"""
)
a.show() <----- 1
a.show() <----- 2
a.show() <----- 3
a.select(col("_2")).show() <-----4
a.select(col("_2")).show() <-----5
a.select(col("_2")).show() <-----6
It's some spark code using scala. I expected the results of 1,2,3 to be the same and 4,5,6 to be the same, but it wasn't. Of course, adding "order by _2" to the query gives the expected result.I think it's because of the inner workings of spark, but I'm not sure. Could you please elaborate on this?
CodePudding user response:
a.select(col("_2")) doesn't order the column
I tried your code but get expected results: 1,2,3 are all listing:
--------- ---
| _1| _2|
--------- ---
| value: 1| 1|
| value: 2| 2|
| value: 3| 3|
| value: 4| 4|
| value: 5| 5|
| value: 6| 6|
| value: 7| 7|
| value: 8| 8|
| value: 9| 9|
|value: 10| 10|
|value: 11| 11|
|value: 12| 12|
|value: 13| 13|
|value: 14| 14|
|value: 15| 15|
|value: 16| 16|
|value: 17| 17|
|value: 18| 18|
|value: 19| 19|
|value: 20| 20|
--------- ---