I have a columnn "content" that has below data
<div >
<div >
<div >
</div>
<span >$20 </span>
</div> </div>
Get FREE baskets $15.01 items.
I need to extract 15.01 in scala which changes for every request.
I wrote the below code, I am not getting error, but the value is not getting captured
.withColumn("AB", regexp_extract($"content","Get\\s\\w*([0-9]\\d*) .{3}",0))
Any help would be great.
CodePudding user response:
You could either match GET
and then match until the first occurrence of a digit:
\bGet\s\D*(\d \.(?:\d )?)\b
The pattern matches:
\bGet\s
Match the wordGet
and a whitespace char\D*
Match optional non digits(
Capture group 1\d \.(?:\d )?
Match 1 digits with an optional decimal part
)
Close group 1\b
A word boundary to prevent a partial word match
With the doubled backslashes:
"\\bGet\\s\\D*(\\d \\.(?:\\d )?)\\b"
Or you can also prevent crossing <
and >
as well:
\bGet\s[^\d<>]*(\d \.(?:\d )?)\b