Home > Blockchain >  Regex in Scala to extract value from a column
Regex in Scala to extract value from a column

Time:09-28

I have a columnn "content" that has below data

<div >
<div >
<div >
</div>
<span >$20 </span>
</div> </div>
Get FREE baskets $15.01 items.

I need to extract 15.01 in scala which changes for every request.

I wrote the below code, I am not getting error, but the value is not getting captured

.withColumn("AB", regexp_extract($"content","Get\\s\\w*([0-9]\\d*) .{3}",0)) 

Any help would be great.

CodePudding user response:

You could either match GET and then match until the first occurrence of a digit:

\bGet\s\D*(\d \.(?:\d )?)\b

The pattern matches:

  • \bGet\s Match the word Get and a whitespace char
  • \D* Match optional non digits
  • ( Capture group 1
    • \d \.(?:\d )? Match 1 digits with an optional decimal part
  • ) Close group 1
  • \b A word boundary to prevent a partial word match

With the doubled backslashes:

"\\bGet\\s\\D*(\\d \\.(?:\\d )?)\\b"

Regex demo

Or you can also prevent crossing < and > as well:

\bGet\s[^\d<>]*(\d \.(?:\d )?)\b

Regex demo

  • Related