I need to filter a Dataset searching for Special Characters and remove the row where it was found. I was trying to just replace the special character with "", but it doesn't worked either.
Dataset dataset; dataset.withColumn("nameColumn", function.regex_replace(dataset.col("nameColumn"), "[^\p{ASCII}]", ""));
CodePudding user response:
You can just filter them:
filitered_ds = dataset.where(!col("nameColumn").rlike("[^\p{ASCII}]"))