Home > OS >  xml filter on nested object using ruby
xml filter on nested object using ruby

Time:12-01

I have below xml format log file

<QuerySiteInformation>
    xmlns="http://www.example.com"
    <Site>
        <id>abc-cde-fvvvv</id>
        <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
        </Item>
    </Site>
    <SiteInteraction>
        <InteractionItem>
            <Location>
                <id>8496940--2842047577555</id>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            </Location>
        </InteractionItem>
    </SiteInteraction>
</QuerySiteInformation>

I am wanting to mutate the xml tag <objectMessage>message in multiples lines</objectMessage> into <objectMessage>MESSAGE HAS BEEN REMOVED</objectMessage> ONLY when <objectMessage> tag is inside <Item> tag

I have below part of the config which can look through and mutate the xml into the the message that i want

<objectMessage>Internal> message shown here in multiple lines</objectMessage>

config

filter {
 mutate {
  gsub => [
    "some regex pattern can do the xml tag filtering", "MESSAGE HAS BEEN REMOVED"

   ]
 }
}

However, this will change all the <objectMessage> message shown here in multiple lines</objectMessage> including the one outside of <Item> field

I know using ruby plugin can do a better job and shouldn't be using regex for xml parsing at all. but this is the closest i can land on so far.

CodePudding user response:

Ideally you want to use the built in xml filter plugin, it is way more reliable and maintanable:

https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html

The following conf file will parse the XML and replace the values for the inner object:

input {
    generator {
        lines => [
        '<QuerySiteInformation>
            xmlns="http://www.example.com"
            <Site>
            <id>abc-cde-fvvvv</id>
            <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
            </Item>
            <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
            </Item>
            </Site>
            <SiteInteraction>
            <InteractionItem>
            <Location>
                <id>8496940--2842047577555</id>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            </Location>
            </InteractionItem>
            </SiteInteraction>
        </QuerySiteInformation>'
        ]
        count => 1
    }
}

filter {
    xml {
        source => "message"
        target => "xml"
        store_xml => true
        remove_field => ["message"]
    }
}

filter {
  ruby {
    code => '
      event.get("[xml][Site][0][Item]").each_with_index do |item, index|
        event.set("[xml][Site][0][Item][#{index}]", "REMOVED MESSAGE")
      end 
    '
  }
}

output {
    stdout {
        codec => rubydebug
    }
}

Output:

{
          "host" => {
        "name" => "Mac-Studio.local"
    },
      "@version" => "1",
    "@timestamp" => 2022-11-28T13:47:31.352282Z,
         "event" => {
        "original" => "<QuerySiteInformation>\n            xmlns=\"http://www.example.com\"\n            <Site>\n            <id>abc-cde-fvvvv</id>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            </Site>\n            <SiteInteraction>\n            <InteractionItem>\n            <Location>\n                <id>8496940--2842047577555</id>\n                <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            </Location>\n            </InteractionItem>\n            </SiteInteraction>\n        </QuerySiteInformation>",
        "sequence" => 0
    },
           "xml" => {
                "content" => [
            [0] "\n            xmlns=\"http://www.example.com\"\n            ",
            [1] "\n            ",
            [2] "\n        "
        ],
                   "Site" => [
            [0] {
                  "id" => [
                    [0] "abc-cde-fvvvv"
                ],
                "Item" => [
                    [0] "REMOVED MESSAGE",
                    [1] "REMOVED MESSAGE"
                ]
            }
        ],
        "SiteInteraction" => [
            [0] {
                "InteractionItem" => [
                    [0] {
                        "Location" => [
                            [0] {
                                           "id" => [
                                    [0] "8496940--2842047577555"
                                ],
                                "objectMessage" => [
                                    [0] "Internal> message shown here in multiple lines"
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
  • Related