Home > Mobile >  Remove HTML Tags MondoDB
Remove HTML Tags MondoDB

Time:06-16

I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.

Below is a sample document

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project

This is the expected output:

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

Thanks!

CodePudding user response:

One way to do it is by removing all tags by regex in pre hook of save method

Description.replace(/(<([^>] )>)/gi, "");

See hooks here

CodePudding user response:

If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.

db.getCollection("name_of_your_collection").aggregate({
    $set: {
        contentRegex: {
            $regexFind: { input: "$Description", regex: /([^<>] )(?!([^<] )?>)/gi }
        }
    }
},
    {
        $set: {
            content: { $ifNull: ["$contentRegex.match", "$Description"] }
        }
    },
    {
        $unset: [ "contentRegex" ]
    }
)
  • Related