Home > Blockchain >  how to generate regex from a substring,for example
how to generate regex from a substring,for example

Time:03-15

I want to get a regex

condition param :input string and select substring index and regex group name

example1:

param:select substring,and input a group name

{
    "logInfo": "<86>Feb 19 2022 14:03:19 idss-03 sshd[767719]: Accepted password for root from 10.11.39.16 port 40658 ssh2",
    "subLogs": [
        {
            "begin": 4,
            "end": 24,
            "name": "time"
        }
    ]
}

return:

{
    "status": "success",
    "content": {
        "reg": "(?<time>[\w] \s [\d] \s [\d] \s [\\d] :[\d] :[\d] )"
    },
    "message": "success",
    "errorCode": ""
}

example2:

param:select multi substring,and input multi group name

{
    "logInfo": "<86>Feb 19 2022 14:03:19 idss-03 sshd[767719]: Accepted password for root from 10.11.39.16 port 40658 ssh2",
    "subLogs": [
        {
            "begin": 4,
            "end": 24,
            "name": "time"
        },
        {
            "begin": 96,
            "end": 101,
            "name": "port"
        }
    ]
}

return:

{
    "status": "success",
    "content": {
        "reg": "(?<time>[\\w] \\s [\\d] \\s [\\d] \\s [\\d] :[\\d] :[\\d] )\\s [\\w] -[\\d] \\s [\\w] \\[[\\d] \\]:\\s [\\w] \\s [\\w] \\s [\\w] \\s [\\w] \\s [\\w] \\s [\\d] \\.[\\d] \\.[\\d] \\.[\\d] \\s [\\w] \\s (?<port>[\\d] )"
    },
    "message": "success",
    "errorCode": ""
}

how can do it ? use java language, I have no idea how to do it,Can you give me some advice?

CodePudding user response:

This is obviously completely impossible.

Let's say your input string is "2022" as you have in your examples, and you 'want a regex that matches that'.

Okay. Here are some:

  • 2022
  • 20 22
  • 21*[0-5] 2*
  • \d{4}
  • \d*
  • \d
  • \d{2,}
  • [0-9]
  • [02]{4}
  • (?i)[0-9] (?:\\s AD|BC)

the list is literally endless.

Hence, without some additional input (such as human eyeballs or from a small 'selection'), what you want is impossible.

We're entering hard Artificial Intelligence territory: As a human you can look at that lot and go: I know! That's a timestamp! - but you're just making assumptions. It could be a build year a serial number that unfortunately look exactly like a timestamp. As AI solutions to this tend to be, they'd be:

  • incredibly complicated, as in years of work.
  • they are a heuristic tool: There will always be sequences where the AI tool either has no idea, or takes a wild stab in the dark - and as a consequence, sometimes guesses wrong.

Whatever happened to make you go: I know - I will write a program that figures out the right regex to parse a logline given an arbitrary logline - you went a step too far, because that's not doable. For example, whichever code emits the logline generally knows exactly what it looks like, so perhaps they can contribute the regular expression somehow.

CodePudding user response:

Agreed on rzwitserloot however, there is an online tool.

https://regex-generator.olafneumann.org/

The source codes can be found under following repository.

https://github.com/noxone/regex-generator

This may give you the insight to build your own. If you add enough constraints, then you can achive your goal

  • Related