Home > Software engineering >  Regex expression to exclude a string between two capture groups
Regex expression to exclude a string between two capture groups

Time:04-08

I am trying to capture the user name and channel id of that user from an api string using regex.

Unfortunately I can not use a JSON Parser on the JSON format so I (beginner) am stuck with Regex.

My solution finds the username matches its string, finds the channel id and also matched that string. Because it's non-greedy, it finds the shortest possible solution and creates several capture groups, if multiple persons are connected.

But a problem arises, if multiple users of the server are online but some not connected to a channel. Regex then finds the first username and uses the in-between space until it finds the channel id of the next user. Then it obviously gives me the correct channel id but the incorrect user.

I excluded the symbol "{" at some point, because it separates different users and that worked. Unfortunately on some occasions "{" can also occur inside the users parameters so some were not captured anymore.

Now I tried to ban the string ""id"" from the allowed string between the two capture groups instead.

But I can't get it to work. Do you have any suggestions?

This example captures User 1 and 3 correctly but matches username User 2 with the channel id of Bot 1. I don't know much about flavors but it said PCRE(PHP) on the test site and so far that worked for my program. I shortened the avatar links and beginning with ....

Regular Expression:

username": "((?!Bot 1).*?)".*?channel_id": "([0-9]*?)"

String snippet:

"members": [{"id": "0", "username": "User 1", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "0123456789", "avatar_url": "https://..."}, {"id": "1", "username": "User 2", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "pls help"}, "avatar_url": "https://..."}, {"id": "2", "username": "Bot 1", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "music | ;;help"}, "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "1234567890", "avatar_url": "https://..."}, {"id": "3", "username": "User 3", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "2345678901", "avatar_url": "https://..."}], "presence_count": 4}

CodePudding user response:

Like other suggested, plan A should be to parse the object. For plan B your regex might look like this:

"username": "([^"] )"

It gets a bit trickier if you allow escapes, for example, if a username is "User says "hi" always". In which case you could use the pattern described here: enter image description here

Here we would have the normal case being [^"\\] (not double-quote or escape char), and the special case being \\" (escape double-quote).

CodePudding user response:

Don't allow {"id" between username and channel:

username": "((?!Bot 1)[^"]*)"(?:(?!\{"id").)*channel_id": "(\d )"

See live demo.

Username and channel ID are captured in groups 1 and 2.

Some other minor adjustments included.

  • Related