Note: we should not use pandas.read_excel() while reading excel in my case. we only need to use spark-excel jar installed in our cluster.
my main point is. we have skip few lines in the excel sheet while reading the file by using any logic or any parameter like ("skipFirstRows", "[int value]")
df = spark.read.format("com.crealytics.spark.excel")\
.option("header", "true")\
.option("inferSchema", "true")\
.option("skipFirstRows","1")\
.option("treatEmptyValuesAsNulls", "true")\
.load("dbfs:/FileStore/filename.xlsx")
df
Even after using this parameter .option("skipFirstRows","1") the line was not getting skipped while reading. it's raise error in the first line itself.
ERROR: java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC formula cell
My excel has one numeric value in the first row in the 6th or 7th cell and from the second line of my excel the actual header starts.
so i have to skip that first line.
Excel sample :
please help me to achieve this.
Thank you
CodePudding user response:
skipFirstRows
was deprecated in favor of more generic dataAddress option. For skipping rows in your example, try:
df = spark.read.format("com.crealytics.spark.excel")\
.option("dataAddress","A2")
.load("dbfs:/FileStore/filename.xlsx")