I want to perform some validation on column Phone number mentioned below and then update the "Correct" column with "Y" (when the number looks valid) or "N" (when invalid).
CodePudding user response:
val df = Seq[(String)]("", " ", null, "123456789a", "1111111111", "1.3-4567 80", " 1.23-4567 890 ", "1234567890").toDF("PhoneNumber")
val trimmed = regexp_replace(trim($"PhoneNumber"), "[ .-]", "")
val correct = trimmed.rlike(raw"\d{10,}") &&
!(trimmed.rlike(raw"^(\d)\1*$$"))
val df2 = df.withColumn("Correct", when(correct, "Y").otherwise("N"))
df2.show()
// --------------- -------
// | PhoneNumber|Correct|
// --------------- -------
// | | N|
// | | N|
// | null| N|
// | 123456789a| N|
// | 1111111111| N|
// | 1.3-4567 80| N|
// | 1.23-4567 890 | Y|
// | 1234567890| Y|
// --------------- -------
trim($"PhoneNumber")
removes leading and trailing spaces
regexp_replace(..., "[ .-]", "")
removes spaces, dots and commas
.rlike(raw"\d{10,}")
checks for 10 or more digits
!(....rlike(raw"^(\d)\1*$$"))
checks for all the same digits