I would like to concatenate values from adjacent objects if a given value is matched.
As you can see below, Text fields with ParagraphSpan in the Path are split between objects, as opposed to P's which are self contained.
{
"Path": "//Document/P[5]",
"Text": "WELLNESS AS A HEALTH GOAL "
},
{
"Path": "//Document/P[6]/ParagraphSpan",
"Text": "Although we use the words health and wellness interchangeably, they differ in two important ways. Health can be determined "
},
{
"Path": "//Document/P[6]/ParagraphSpan[2]",
"Text": "or influenced by factors beyond your control, such as your genes, age, and family history."
},
{
"Path": "//Document/P[7]",
"Text": "Dimensions of Wellness "
},
{
"Path": "//Document/P[8]",
"Text": "The process of achieving wellness is continual and dynamic, involving change and growth. "
},
{
"Path": "//Document/P[9]",
"Text": "Your physical wellness includes not just your body’s overall condition and the absence of disease "
}
Is it possible to combine the Text under a ParagraphSpan?
"Path": "//Document/P[6]/ParagraphSpan",
"Text": "Although we use the words health and wellness interchangeably, they differ in two important ways. Health can be determined or influenced by factors beyond your control, such as your genes, age, and family history. ",
CodePudding user response:
If your input consists of a JSON array with {Path, Text} objects as shown, then you could achieve the merging you describe by:
reduce .[] as $x (null;
if $x|.Path|contains("/ParagraphSpan[")
then .[-1].Text = $x.Text
else . [$x] end)
In practice, you might want to add some checks, or to impose more stringent requirements for merging objects, or to ensure that the merging of texts is done properly.