Home > Software design >  How to check column data type in spark
How to check column data type in spark

Time:05-12

I have one imputation method to do mean, median and mode operation but this getting failed if column data type is not in Double/Float.
My java code:

Imputer imputer = new Imputer().setInputCol("amount").setOutputCol("amount);
                                
imputer.setStrategy("mean");
ImputerModel model = imputer.fit(dataset);
model.transform(dataset);

Is there any way to handle this
I am using java

CodePudding user response:

I can suggest one way but not sure that it's a best approach or not.
Step-1: get field details this will return StructField[]
Step-2: Iterate through received array and check data type of columns

private boolean isValidColumnTypes(String[] columnArray, Dataset<?> dataset) {
        StructField[] fieldArray = dataset.schema().fields();
        for (int i = 0; i < columnArray.length; i  ) {
            for (StructField data : fieldArray) {
                    boolean doubleType=data.dataType().toString().equals("DoubleType");
                    boolean floatType=data.dataType().toString().equals("FloatType");   
                    if (columnArray[i].equals(data.name()) && !(doubleType ||floatType)){
                    return false;
                }
            }
        }
        return true;
    }

In the above method I am passing column names as String array String[] columnArray

  • Related