Home > Net >  Mongodb addFields split camel case into words
Mongodb addFields split camel case into words

Time:08-11

I am trying to use $addFields to dynamically create a field where any camelCase words or any other non-standard word breaks (word.break, word_break) in a certain field are replaced with a space to be caught in a phrase match.

Viewing the string operators I thought $replaceAll might do the trick but I can't see any way that I can reference a captured group from the find field, nor can the find field use a simple $regex?

In theory I was thinking it would go something like this:

aggregate.push({
  $addFields: {
    businessNameBreakWords: {
      $replaceAll: {
        input: '$businessName',
        find: regex would go here, e.g. /([a-z])([A-Z])/,
        replacement: '$1 $2,
      },
    },
  },
});

Is what I'm trying to do possible?

Input (i.e. the fields as stored in mongo) e.g.

Lisbon FunSushiBar
Escape Room @york.dungeons.minster

Output Using add fields I want it to look something like:

Lisbon Fun Sushi Bar
Escape Room york dungeons minster

I need to do this using addFields or a projection as this aggregation includes a compound phrase search which I need to run against the created field. This is so someone can search "Lisbon Sushi" and that result appear with a high match score, which it currently doesn't due to the camelCasing containing sushi not being a word boundary.

Thanks.

Note: I have also tried $function but this is unavailable to me with the error $function not allowed in this atlas tier

Mongo v: 4.4.1

CodePudding user response:

Split the businessName string into an array of unicode characters. Reduce the array of characters into a string, replacing special characters with spaces and uppercase characters with space character.

db.users.aggregate([
  {
    "$addFields": {
      "nameRange": {
        "$map": {
          "input": {
            "$range": [
              0,
              {
                "$strLenCP": "$businessName"
              },
              1
            ]
          },
          "as": "inp",
          "in": {
            "$substr": [
              "$businessName",
              "$$inp",
              1
            ]
          }
        }
      }
    }
  },
  {
    "$addFields": {
      "businessNameBreakWords": {
        "$reduce": {
          "input": "$nameRange",
          "initialValue": "",
          "in": {
            "$cond": [
              {
                "$regexMatch": {
                  "input": "$$this",
                  "regex": "^[A-Z]$"
                }
              },
              {
                "$concat": [
                  "$$value",
                  " ",
                  "$$this"
                ]
              },
              {
                "$cond": [
                  {
                    "$regexMatch": {
                      "input": "$$this",
                      "regex": "^[.@_]$"
                    }
                  },
                  {
                    "$concat": [
                      "$$value",
                      " ",
                      
                    ]
                  },
                  {
                    "$concat": [
                      "$$value",
                      "$$this"
                    ]
                  }
                ]
              }
            ]
          }
        }
      }
    }
  },
  {
    "$project": {
      "businessName": 1,
      "businessNameBreakWords": 1
    }
  }
])

Link to Mongo Playground

  • Related