I'm working with Scala and trying to save my calendar information from Spark to Cassandra.
I started with creating the same schema with Cassandra:
session.execute("CREATE TABLE calendar (DateNum int, Date text, YearMonthNum int, ..., PRIMARY KEY (datenum,date))")
and then imported my data from spark to Cassandra:
.write
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "calendar", "keyspace" -> "ks"))
.mode(SaveMode.Append)
.save()
But once I try to read the data I retrieved from Spark on Cassandra, the rows appear so mixed up together, while I want to keep the same order my calendar has.
An example of a row I have:
20090111 | 1/11/2009 | 200901 |...
Select/Order don't seem to fix the problem too.
CodePudding user response:
Data in Cassandra is ordered only inside the Cassandra partition, but partitions themselves aren't sorted by value, and organized by hash of partition key. So when you read data, you can read nearby Cassandra partitions, but they may belong to completely different dates.
So if you have data sorted in Spark, you need to explicitly sort data using the .orderBy