Using Regex, I want to remove all hyphens only from specific quoted words in a text file.
Scenario: This is from code where I need to remove hyphens from Json property names before the Json string is deserialized using a JsonConstructor (which does not honor JsonPropertyName attributes see edit below).
Example list of words which need hyphens removed (quotes should be part of the match):
"monthly-amount"
"annual-amount"
"ask-for-shiping-address"
Example input Json:
{
"id": "b9d7574f-5246-4c94-ade5-1d4e9b169afc",
"name": "James-John Moonly-Batcher"
"monthly-amount" : 2000,
"annual-amount" : 12000,
"ask-for-shiping-address" : false
}
Expected output Json:
{
"id": "b9d7574f-5246-4c94-ade5-1d4e9b169afc",
"name": "James-John Moonly-Batcher"
"monthlyamount" : 2000,
"annualamount" : 12000,
"askforshipingaddress" : false
}
Note that in the example, hyphens in the name and id values are not removed.
I understand how I can match the words in the list. What I don't understand is how I can then use the matched word and replace all hyphens in it.
Alternatively, if a regex can be constructed that works without the word list wich can replace hyphens in any property name (but not in property values), that would also be welcome.
I would love to receive some input on this task before the little hair I have left is being scratched off my head.....
EDIT:
Only after I successfully applied the accepted answer did I realize that my initial problem was an entirely different one: wrong parameter names of the class constructor for deserialization (JSON constructor).
I had misunderstood the docs about how parameter names of a JsonConstructor need to be, thinking that the parameter names of the JSON constructor need to match the names in the JSON, whereas in reality, the parameter names of the JSON constructor need to match the actual property names of the C# class (case insensitive). By matching the property names of the C# class, the deserializer can find the JsonPropertyName attribute corresponding to the constructor parameter name, and thus it can find the JSON names to look for.
So with the correct parameter names in place, there is no need to alter Json property names in the first place :-)
CodePudding user response:
An alternative approach can be skipping regexes altogether and introduce intermediate step by using JsonNode
API (available since .NET 6) to change the property names:
var jsonObject = JsonNode.Parse(json).AsObject();
var toModify = jsonObject.Where(pair => pair.Key.Contains("-")).ToList();
foreach (var prop in toModify)
{
var name = prop.Key;
var value = prop.Value;
var newName = name.Replace("-", "");
jsonObject.Remove(name);
jsonObject[newName] = value;
}
var result = jsonObject.Deserialize<...>();
CodePudding user response:
Lookaround assertions can be used to check for some condition at each hyphen.
A lookahead can check if e.g. not being followed by a double quote and colon:
-(?=[^"]*"\s*:)
See this demo at regex101 or a C# replace demo at tio.run
Or a negative lookbehind for checking if there is no colon before in the line.
-(?<!:. )
Another demo at regexstorm and the C# replace demo at tio.run
The Stackoverflow Regex FAQ is a good ressource for explanation. What's used in these patterns is a [^
negated character class ]
, the dot, *
and
quantifiers and shorthand \s
for whitespace.