I am working in postgresql. I have an int column "DOC_ID" and a text column "CONTENT". Each row of the table contains information about the document.
There are sections of text that match the regular expression
'#[A-Za-z] :[A-Za-z] #
. They are different!
For example: #Lot : version# , #Participant : name# and others.
I want for each value that satisfies the regular expression to get a column with the ID of the document in which it occurs. Considering that in one document it can be several times.
Example: My table:
DOC_ID CONTENT
1 '#Participant : name# cat bear dog #Participant : name# mouse'
2 'cat #Participant : name# bear'
3 'cat #Lot : version# dog'
10 'mouse #Lot : version# cow'
Result table
TAG DOC_ID
#Participant : name# 1
#Participant : name# 1
#Participant : name# 2
#Lot : version# 3
#Lot : version# 10
Please tell me how can I do this.
CodePudding user response:
Use the function regexp_matches()
:
select
(regexp_matches(content, '#[A-Za-z0-9 №%--,. ] :[A-Za-z0-9 №%--,. ] #', 'g'))[1] as tag,
doc_id
from my_table
Read about the function and pattern matching in the documentation.
Test the query in db<>fiddle.