I'm trying to use Azure Speech to text service. In the documentation I'm confronted with examples, that use V1 API version:
https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1
And basically every link to proper documentation is for the V3 API.
https://{endpoint}/speechtotext/v3.0
In this V1 example you can easily send your file as binary.
curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file
But I could not figure it out how to provide an wordLevelTimestampsEnabled=true
parameter for getting word level timestamps.
On the other hand, I tried using the V3 API, and I can easily provide wordLevelTimestampsEnabled=true
parameter, but I couldn't figure out how to send binary file data.
curl -L -X POST 'https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Ocp-Apim-Subscription-Key: $key' --data-raw '{
"contentUrls": [
"https://url-to-file.dev/test-file.wav"
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "pl-PL",
"displayName": "Transcription using default model for pl-PL"
}'
Is there a way to pass a binary file and also get word level timestamps with wordLevelTimestampsEnabled=true
parameter?
CodePudding user response:
Is there a way to pass a binary file and also get word level timestamps with
wordLevelTimestampsEnabled=true
parameter?
As suggested by Code Different, converting a comment as a community wiki answer to help community members who might face a similar issue.
As per the documentation, binary file can't be uploaded directly. You should provide URL via contentUrls
property.
For example:
{
"contentUrls": [
"<URL to an audio file to transcribe>",
],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": true,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"locale": "en-US",
"displayName": "Transcription of file using default model for en-US"
}
You can refer to Speech-to-text REST API v3.0, cognitive-services-speech-sdk and Azure Speech Recognition - use binary / hexadecimal data instead of WAV file path