I am trying to capture the user name and channel id of that user from an api string using regex.
Unfortunately I can not use a JSON Parser on the JSON format so I (beginner) am stuck with Regex.
My solution finds the username matches its string, finds the channel id and also matched that string. Because it's non-greedy, it finds the shortest possible solution and creates several capture groups, if multiple persons are connected.
But a problem arises, if multiple users of the server are online but some not connected to a channel. Regex then finds the first username and uses the in-between space until it finds the channel id of the next user. Then it obviously gives me the correct channel id but the incorrect user.
I excluded the symbol "{" at some point, because it separates different users and that worked. Unfortunately on some occasions "{" can also occur inside the users parameters so some were not captured anymore.
Now I tried to ban the string ""id"" from the allowed string between the two capture groups instead.
But I can't get it to work. Do you have any suggestions?
This example captures User 1 and 3 correctly but matches username User 2 with the channel id of Bot 1. I don't know much about flavors but it said PCRE(PHP) on the test site and so far that worked for my program. I shortened the avatar links and beginning with ....
Regular Expression:
username": "((?!Bot 1).*?)".*?channel_id": "([0-9]*?)"
String snippet:
"members": [{"id": "0", "username": "User 1", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "0123456789", "avatar_url": "https://..."}, {"id": "1", "username": "User 2", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "pls help"}, "avatar_url": "https://..."}, {"id": "2", "username": "Bot 1", "discriminator": "0000", "avatar": null, "status": "online", "game": {"name": "music | ;;help"}, "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "1234567890", "avatar_url": "https://..."}, {"id": "3", "username": "User 3", "discriminator": "0000", "avatar": null, "status": "online", "deaf": false, "mute": false, "self_deaf": false, "self_mute": false, "suppress": false, "channel_id": "2345678901", "avatar_url": "https://..."}], "presence_count": 4}
CodePudding user response:
Like other suggested, plan A should be to parse the object. For plan B your regex might look like this:
"username": "([^"] )"
It gets a bit trickier if you allow escapes, for example, if a username is "User says "hi" always". In which case you could use the pattern described here:
Here we would have the normal case being [^"\\]
(not double-quote or escape char), and the special case being \\"
(escape double-quote).
CodePudding user response:
Don't allow {"id"
between username and channel:
username": "((?!Bot 1)[^"]*)"(?:(?!\{"id").)*channel_id": "(\d )"
See live demo.
Username and channel ID are captured in groups 1 and 2.
Some other minor adjustments included.