Home > Back-end >  Facebook's Duckling Cannot Identify Time Dimension Correctly
Facebook's Duckling Cannot Identify Time Dimension Correctly

Time:03-24

I'm using Facebook's Duckling to parse text. When I pass the text: 13h 47m it correctly classifies the entire text as DURATION (= 13 hours 47 minutes).

However, when I pass the text: 13h 47m 13s it cannot identify the 13s part of the String as being part of the DURATION. I was expecting it to parse it as 13 hours, 47 minutes and 13 seconds but it essentially ignores the 13s part as not being part of the DURATION.

Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array: 
[
  {
    "latent": false,
    "start": 0,
    "dim": "duration",
    "end": 7,
    "body": "13h 47m",
    "value": {
      "unit": "minute",
      "normalized": {
        "unit": "second",
        "value": 49620
      },
      "type": "value",
      "value": 827,
      "minute": 827
    }
  },
  {
    "latent": false,
    "start": 8,
    "dim": "number",
    "end": 10,
    "body": "13",
    "value": {
      "type": "value",
      "value": 13
    }
  }
]

Is this a bug? How can I update Duckling so that it parses the text as described above?

CodePudding user response:

The documentation seems pretty clear about this:

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs
  • Duckling/<Dimension>/<Lang>/Corpus.hs
  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)
  • Duckling/Rules/<Lang>.hs

Taking a look in Duckling/Duration/Rules.hs, I see:

ruleIntegerUnitofduration = Rule
  { name = "<integer> <unit-of-duration>"
  , pattern =
    [ Predicate isNatural
    , dimension TimeGrain
    ]
  -- ...

So next I peeked in Duckling/TimeGrain/EN/Rules.hs (because Duckling/TimeGrain/Rules.hs did not exist), and see:

grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?",      TG.Second)
         -- ...

Presumably this means 13h 47m 13sec would parse the way you want. To make 13h 47m 13s parse in the same way, I guess the first thing I would try would be to make the regex above a bit more permissive, maybe something like s(ec(ond)?s?)?, and see if that does the trick without breaking anything else you care about.

  • Related