I created 3 expressions with the idea of extracting some values between forward slash (/) with substring_index, but I know that I can create that simpler with regexp_extract function. The scenario is:
- Expression: 'AC: 41231/REGULAR/041293'
And I want to extract the following values:
- the first value: 41231 where I created this:
.withColumn("ACode", substring_index(substring_index(col('column1'), 'AC: ', -1), '/', 1))
- the second value: REGULAR where I created this:
.withColumn("Bcode", when(col('column1').like('%/%'), substring_index(substring_index(substring_index(col('column1'), 'AC: ', -1), '/', -2), '/', 1)) .otherwise(lit("")))
- the third value: 041293 where I created this:
.withColumn("Ccode", when(col('column1').like('%/%'), substring_index(substring_index(col('column1'), 'AC: ', -1), '/', -1)) .otherwise(lit("")))
Do you know how to make it simpler?
column1 | Acode | BCode | Ccode |
---|---|---|---|
AC: 41231/REGULAR/041293 | 41231 | REGULAR | 041293 |
Thanks and kind regards
CodePudding user response:
How about split
?
you can:
.withColumn("Acode", split("column1", "/")[0])
.withColumn("Bcode", split("column1", "/")[1])
.withColumn("Ccode", split("column1", "/")[2])