Home > Software engineering >  How do I make this regular expression work?
How do I make this regular expression work?

Time:11-25

I'm creating my own ELK dashboard to monitor my finances.

I got completely wiped out this year, a combination of many things, but most likely just poor fiscal responsibility.

Anyway;

I'm a regex newb, and I'm having a hard time with this.

Is there a way to quickly match strings with many trailing and leading whitespaces?

Here are my characters:

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Account:                                                                                                                                                                                                            ************0000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Purchase Amount:                                                                                                                                                                                                            $10.00                                                                                                                                                                                                                                                                                                                                                                                                       Transaction Date:                                                                                                                                                                                                            November 10, 2022                                                                                                                                                                                                                                                                                                                                                                                                       Transaction Description:                                                                                                                                                                                                            UBER *TRIP HELP.UBER.C                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

Here's what I am trying in regexr.com

(?<account>(?<=Account:)(.*)(?=\s*Pur)) And my results contain a lot of whitespaces:

                                                                                                                                                                                                            ************0000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

I'd like to have all the transaction $KEY:$VALUE pairs as named captures for grok filtering my bank transactions.

The results should be:

(?<account>($StackOverFlowSuperChargedRegex)

**************0000

Here is my regxr.com workspace link: regexr.com/736tg

EDIT: I am applying this grok pattern to an elastic search ingest pipeline, but I am not opposed to using it for a logstash ingest.

EDIT 2: @Paulo

Here is the content field after applying trim and gsub (without the dissect processor applied)

"content": "View Online Hello, As requested, we’re letting you know that a purchase of $10.00 was made on your RBC Royal Bank credit card account ************0000 on November 12, 2022 towards UBER *TRIP HELP.UBER.C. If you don’t recognize this transaction, please call us at 1‑800‑769‑2512 (available 24/7) and we’ll be happy to help. Account: ************0000 Purchase Amount: $10.00 Transaction Date: November 12, 2022 Transaction Description: UBER *TRIP HELP.UBER.C Thank you!     - Privacy & Security | Legal -   RBC Royal Bank | Royal Bank of Canada RBC WaterPark Place, 88 Queens Quay West, 12th Floor, Toronto, ON, M5J 0B8, Canada www.rbcroyalbank.com. ®/TM Trademark(s) of Royal Bank of Canada. RBC and Royal Bank are registered trademarks of Royal Bank of Canada. © Royal Bank of Canada 2022   -   Communicating Safely Online   Regular, unencrypted email is not secure. You should never include personal or confidential information in a regular email. Be careful when opening messages, links or attachments received through digital channels, including regular emails, text messages and social media messages. If you receive a message that appears to be from RBC that is suspicious please report it to us and then delete it. Do not provide personal information like passwords.   Need Help? To discuss your personal information with us safely, visit our customer service page. Please note this email was sent from an unmonitored inbox. Do not reply.   For current scam alerts and tips to protect yourself visit: RBC Cyber Security | Active Scam Alerts    "
        },
        "_ingest": {
          "timestamp": "2022-11-25T11:18:28.621402003Z"
        }

CodePudding user response:

Tldr

Not using Grok, but I feel It may help

Solution

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "_description",
    "processors": [
      {
        "trim": {
          "field": "message"
        }
      },
      {
        "gsub": {
          "field": "message",
          "pattern": """\s """,
          "replacement": " "
        }
      },
      {
        "dissect": {
          "field": "message",
          "pattern": "Account: %{account} Purchase Amount: %{amount} Transaction Date: %{date} Transaction Description: %{description}"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Account:                                                                                                                                                                                                            ************0000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Purchase Amount:                                                                                                                                                                                                            $10.00                                                                                                                                                                                                                                                                                                                                                                                                       Transaction Date:                                                                                                                                                                                                            November 10, 2022                                                                                                                                                                                                                                                                                                                                                                                                       Transaction Description:                                                                                                                                                                                                            UBER *TRIP HELP.UBER.C                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
"
      }
    }
  ]
}

CodePudding user response:

Fixed via @Paulo and updating my grok patterns; not the most efficient/elegant solution but it works well enough.

    PUT _ingest/pipeline/fscrawler
{
  "version": 1,
  "processors": [
    {
      "trim": {
        "field": "content",
        "ignore_missing": true
      }
    },
    {
      "gsub": {
        "field": "content",
        "pattern": "\\s ",
        "replacement": " "
      }
    },
    {
      "grok": {
        "field": "content",
        "patterns": [
          "(?<account>(?<=Account:\\s)(.*)(?=\\sPurchase))"
        ],
        "trace_match": true,
        "ignore_missing": true,
        "ignore_failure": true
      }
    },
    {
      "grok": {
        "field": "content",
        "patterns": [
          "(?<amount>(?<=Amount:\\s)(.*)(?=\\sTransaction\\sDate))"
        ],
        "trace_match": true,
        "ignore_missing": true,
        "ignore_failure": true
      }
    },
    {
      "grok": {
        "field": "content",
        "patterns": [
          "(?<transaction_date>(?<=Transaction\\sDate:\\s)(.*)(?=\\sTransaction\\sDescription))"
        ],
        "trace_match": true,
        "ignore_missing": true,
        "ignore_failure": true
      }
    },
    {
      "grok": {
        "field": "content",
        "patterns": [
          "(?<transaction_description>(?<=Transaction\\sDescription:\\s)(.*)(?=\\sThank))"
        ],
        "trace_match": true,
        "ignore_missing": true,
        "ignore_failure": true
      }
    }
  ]
}
  • Related