Home > Back-end >  Count the number of different sections of text that match a regular expression
Count the number of different sections of text that match a regular expression

Time:04-19

I am working in postgresql. I have an int column "DOC_ID" and a text column "CONTENT". Each row of the table contains information about the document.

There are sections of text that match the regular expression

'#[A-Za-z] :[A-Za-z] #. They are different!

For example: #Lot : version# , #Participant : name# and others.

I want for each value that satisfies the regular expression to get a column with the ID of the document in which it occurs. Considering that in one document it can be several times.

Example: My table:

DOC_ID       CONTENT
1            '#Participant : name# cat bear dog #Participant : name# mouse'
2            'cat #Participant : name# bear'
3            'cat #Lot : version# dog'
10           'mouse #Lot : version# cow'

Result table

TAG                       DOC_ID
#Participant : name#       1
#Participant : name#       1
#Participant : name#       2
#Lot : version#            3
#Lot : version#            10

Please tell me how can I do this.

CodePudding user response:

Use the function regexp_matches():

select 
    (regexp_matches(content, '#[A-Za-z0-9 №%--,. ] :[A-Za-z0-9 №%--,. ] #', 'g'))[1] as tag,
    doc_id
from my_table

Read about the function and pattern matching in the documentation.

Test the query in db<>fiddle.

  • Related