I'm using Facebook's Duckling to parse text. When I pass the text: 13h 47m
it correctly classifies the entire text as DURATION
(= 13 hours 47 minutes).
However, when I pass the text: 13h 47m 13s
it cannot identify the 13s
part of the String as being part of the DURATION
. I was expecting it to parse it as 13 hours, 47 minutes and 13 seconds
but it essentially ignores the 13s
part as not being part of the DURATION
.
Command: curl -XPOST http://127.0.0.1:0000/parse --data locale=en_US&text="13h 47m 13s"
JSON Array:
[
{
"latent": false,
"start": 0,
"dim": "duration",
"end": 7,
"body": "13h 47m",
"value": {
"unit": "minute",
"normalized": {
"unit": "second",
"value": 49620
},
"type": "value",
"value": 827,
"minute": 827
}
},
{
"latent": false,
"start": 8,
"dim": "number",
"end": 10,
"body": "13",
"value": {
"type": "value",
"value": 13
}
}
]
Is this a bug? How can I update Duckling so that it parses the text as described above?
CodePudding user response:
The documentation seems pretty clear about this:
To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:
Duckling/<Dimension>/<Lang>/Rules.hs
Duckling/<Dimension>/<Lang>/Corpus.hs
Duckling/Dimensions/<Lang>.hs
(if not already present inDuckling/Dimensions/Common.hs
)Duckling/Rules/<Lang>.hs
Taking a look in Duckling/Duration/Rules.hs
, I see:
ruleIntegerUnitofduration = Rule
{ name = "<integer> <unit-of-duration>"
, pattern =
[ Predicate isNatural
, dimension TimeGrain
]
-- ...
So next I peeked in Duckling/TimeGrain/EN/Rules.hs
(because Duckling/TimeGrain/Rules.hs
did not exist), and see:
grains :: [(Text, String, TG.Grain)]
grains = [ ("second (grain) ", "sec(ond)?s?", TG.Second)
-- ...
Presumably this means 13h 47m 13sec
would parse the way you want. To make 13h 47m 13s
parse in the same way, I guess the first thing I would try would be to make the regex above a bit more permissive, maybe something like s(ec(ond)?s?)?
, and see if that does the trick without breaking anything else you care about.