Home > front end >  How to change data type in csv column
How to change data type in csv column

Time:04-26

Using databricks with SQL, I have to import my csv dataset into a table and analyse data using it. My problem is after I imported csv dataset, all column are String type, but some of these need to be Numeric. How can I solve?

How can I define the column types of a csv file? I tried converting file in xlsx and setting numeric type but then it's not possible to convert again in csv (or I don't know how).

Thanks for helping

PS: databricks wants just csv file and not xlsx or similar.

CodePudding user response:

  1. If you are using Databricks on Azure, when you select "Create table with UI" there should be options for you to choose a data type for each column as in the screenshot A below.

  2. If you are importing table by some Python Spark codes, there should be an option, infer_schema, for you to set. If it is set to "true", all columns that contain only numeric will have appropriate numeric data types.

    file_location = "/FileStore/shared_uploads/xxx/dbo_project.csv"
    file_type = "csv"

    infer_schema = "true"
    first_row_is_header = "false"
    delimiter = ","

    df = spark.read.format(file_type) \
      .option("inferSchema", infer_schema) \
      .option("header", first_row_is_header) \
      .option("sep", delimiter) \
      .load(file_location)

Screenshot A Create table with UI from Databricks

  • Related