Home > database >  How do I remove a block of text from mutiple repetitive json files where there is a small change bet
How do I remove a block of text from mutiple repetitive json files where there is a small change bet

Time:12-01

I have a json file that has repetive parts and I'm trying to write a script to remove a certain block of text from multiple files. A Python script would be the most preferred otherwise from my searching sed can work too though I know nothing about it. Here is a sample of the format of my json file:

    {
      "Animal": {
        "Type_species": "Reptile"
      },
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    },
    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },
    {
      "Animal": {
        "Type_species": "Amphibian"
      },
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    },
  1. How would I remove the following from the json file?
    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },

My other question is, 2. How do I adapt the script to account for different "FindMe" Urls across multiple files? For example a second file would have the below and so on for multiple files?

    {
      "Animal": {
        "Type_species": "Mammal"
      },
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
      "Description": "There Are Approximately 5,000 Mammal Species."
    },

I think using a regex can help but I'm having trouble understanding them and implementing them within a script.

Any help is appreciated, thank you.

Update: I would like the end result to look like this:

    {
      "Animal": {
        "Type_species": "Reptile"
      },
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    },
    {
      "Animal": {
        "Type_species": "Amphibian"
      },
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    },

CodePudding user response:

Assuming your complete JSON contains a list of dictionaries (which your sample suggests) then:

JSON = {"data": [{
    "Animal": {
        "Type_species": "Reptile"
    },
    "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
    "Description": "Most are cold blooded."
},
    {
    "Animal": {
        "Type_species": "Mammal"
    },
    "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
    "Description": "There Are Approximately 5,000 Mammal Species."
},
    {
    "Animal": {
        "Type_species": "Amphibian"
    },
    "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
    "Description": "Most amphibians have thin, moist skin that helps them to breathe"
}]}

JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']

print(JSON)

CodePudding user response:

This might work for you (GNU sed):

sed '/^\s*{/{:a;N;/^\(\s*\){.*\n\1},/!ba;/"Type_species": "Mammal"/d}' file

Gather up details for each animal and remove animal if it contains "Type_species": "Mammal".

  • Related