We tried reading excel files in the following ways :
- spark.read.format("com.databricks.spark.excel")
- spark.read.format("com.crealytics.spark.excel")
- spark.read.format("excel")
Received an error stating: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
Note:
- The azure credentials were working fine while trying to read csv file from the same ADLS.
- We have installed the required libraries in Databricks cluster : com.crealytics:spark-excel_2.12:3.1.2_0.16.5-pre0 & openpyxl
CodePudding user response:
You can use pandas to read .xlsx file and then convert that to spark dataframe.
from pyspark.sql import SparkSession
import pandas
spark = SparkSession.builder.appName("Test").getOrCreate()
pdf = pandas.read_excel('excelfile.xlsx', sheet_name='sheetname', inferSchema='true')
df = spark.createDataFrame(pdf)
df.show()
CodePudding user response:
Try this to acces to your ADLS
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": f"{client_id}",
"fs.azure.account.oauth2.client.secret": f"{client_secret}",
"fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"}
def mount_adls(container_name2):
dbutils.fs.mount(
source = f"abfss://{container_name2}@{storage_account_name}.dfs.core.windows.net/",
mount_point = f"/mnt/{storage_account_name}/{container_name2}",
extra_configs = configs)