Home > front end >  Pyspark - Regex_Extract value between forward slash (/)
Pyspark - Regex_Extract value between forward slash (/)

Time:06-13

I created 3 expressions with the idea of extracting some values between forward slash (/) with substring_index, but I know that I can create that simpler with regexp_extract function. The scenario is:

  • Expression: 'AC: 41231/REGULAR/041293'

And I want to extract the following values:

  1. the first value: 41231 where I created this: .withColumn("ACode", substring_index(substring_index(col('column1'), 'AC: ', -1), '/', 1))
  2. the second value: REGULAR where I created this: .withColumn("Bcode", when(col('column1').like('%/%'), substring_index(substring_index(substring_index(col('column1'), 'AC: ', -1), '/', -2), '/', 1)) .otherwise(lit("")))
  3. the third value: 041293 where I created this: .withColumn("Ccode", when(col('column1').like('%/%'), substring_index(substring_index(col('column1'), 'AC: ', -1), '/', -1)) .otherwise(lit("")))

Do you know how to make it simpler?

column1 Acode BCode Ccode
AC: 41231/REGULAR/041293 41231 REGULAR 041293

Thanks and kind regards

CodePudding user response:

How about split?

you can:

.withColumn("Acode", split("column1", "/")[0])
.withColumn("Bcode", split("column1", "/")[1])
.withColumn("Ccode", split("column1", "/")[2])
  • Related