I have a json file that has repetive parts and I'm trying to write a script to remove a certain block of text from multiple files. A Python script would be the most preferred otherwise from my searching sed can work too though I know nothing about it. Here is a sample of the format of my json file:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
- How would I remove the following from the json file?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
My other question is, 2. How do I adapt the script to account for different "FindMe" Urls across multiple files? For example a second file would have the below and so on for multiple files?
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
"Description": "There Are Approximately 5,000 Mammal Species."
},
I think using a regex can help but I'm having trouble understanding them and implementing them within a script.
Any help is appreciated, thank you.
Update: I would like the end result to look like this:
{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
},
CodePudding user response:
Assuming your complete JSON contains a list of dictionaries (which your sample suggests) then:
JSON = {"data": [{
"Animal": {
"Type_species": "Reptile"
},
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
},
{
"Animal": {
"Type_species": "Mammal"
},
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
},
{
"Animal": {
"Type_species": "Amphibian"
},
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}
JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']
print(JSON)
CodePudding user response:
This might work for you (GNU sed):
sed '/^\s*{/{:a;N;/^\(\s*\){.*\n\1},/!ba;/"Type_species": "Mammal"/d}' file
Gather up details for each animal and remove animal if it contains "Type_species": "Mammal"
.