I've defined the following UDF to convert a string to a list of word counts:
CREATE TEMP FUNCTION get_bow_vector(document STRING) RETURNS ARRAY<STRUCT<STRING, INT>> AS ((
SELECT
ARRAY(
SELECT AS STRUCT word AS word, COUNT(1) AS freq
FROM UNNEST(REGEXP_EXTRACT_ALL(document, r'[\w_] ')) AS word
GROUP BY word
ORDER BY freq DESC
)
));
This UDF mostly works as expected: it turns a string into an array of STRUCTs, where each struct contains a word and a count. However the field of the struct are not named word
and freq
as expected. Instead they're given the anonymous field names: _field_1
and _field_2
.
I've got two questions:
- How can I name the fields of the struct?
- If I dont' name the fields of the struct, how do I access them anyway? When I try using
_field_1
I get the following errorField name _field_1 does not exist in STRUCT<STRING, INT64>
CodePudding user response:
Use below instead
CREATE TEMP FUNCTION get_bow_vector(document STRING)
RETURNS ARRAY<STRUCT<word STRING, freq INT>> AS ((
SELECT
ARRAY(
SELECT AS STRUCT word AS word, COUNT(1) AS freq
FROM UNNEST(REGEXP_EXTRACT_ALL(document, r'[\w_] ')) AS word
GROUP BY word
ORDER BY freq DESC
)
));