Home > Software engineering >  How to match 2 key value pairs in unstructured string that looks like column
How to match 2 key value pairs in unstructured string that looks like column

Time:09-21

I have a document with some lines like:

  Example requested: 24:00:00            Example Used: 01:14:11        

What I want is to have is:

{
"example_requested": "24:00:00",
"example_used": 01:14:11
}

What I tried (this is ruby block):

if x =~ /(Example requested)/
        example_requested = x.scan(/^(?:.*?: ){1}(.*)$/)[0]
        example_requested = (memory_requested * "").gsub(/\s /, "")

I get

"example_requested" : "24:00:00ExampleUsed:01:14:11"

Tried out some combinations in https://rubular.com/r/WOmAqx4YkOnDER but this code does not actually work like here when I put it in ruby code block in Logstash. Only thing that actually works here is to skip empty spaces with gsub. Any help is appreciated

UPDATE:

if x =~ /(Example Requested)/
        example_used = x.scan(/^(?:.*?: ){2}(.*)$/)[0]
        example_used = (example_used * "").gsub(/\s /, "")

actually results with:

"example_used": 01:14:11

but I guess there's a better way to do it

CodePudding user response:

I haven't tested this myself, but I'm pretty sure that kv filter could be the way to go here. It accepts field_split_pattern which can be some regex of multiple white spaces and value_split_pattern which can be " :".

Documentation of kv filter.

CodePudding user response:

str = "Example requested: 24:00:00            Example Used: 01:14:11"
r = /\p{Alpha}[\p{Alpha} ]*|\d{2}:\d{2}:\d{2}/    
Hash[*str.scan(r)]
  #=> {"Example requested"=>"24:00:00", "Example Used"=>"01:14:11"}

Note:

str.scan(r)
  #=> ["Example requested", "24:00:00", "Example Used", "01:14:11"]

The regular expression reads, "match a Unicode letter (\p{Alpha}) followed by zero or more (*) Unicode letters or spaces, or three pairs of digits (\d{2}) separated by colons". See Hash::[].

CodePudding user response:

Actually, this worked fine:

ruby {
        code => '
            message = event.get("message")
            matches = message.scan(/\s*([a-zA-Z ] ):\s (([\d:.BKMGT] ))/)
            matches.each { |x|
                event.set(x[0].downcase.gsub(/ /, "_"), x[1])
            }
        '
    }
  • Related