I have a large JSON file that I am using JQ to pair down to only those elements I need. I have that working but there are some values that are string in all caps. Unfortunately, while jq has ascii_downcase and ascii_upcase, it does not have a built in function for uppercasing only the first letter of each word.
I need to only perform this on brand_name and generic_name, while ensure that the manufacturer name is also first letter capitalized with the exception of things like LLC which should remain capitalized.
Here's my current jq statement:
jq '.results[] | select(.openfda.brand_name != null or .openfda.generic_name != null or .openfda.rxcui != null) | select(.openfda|has("rxcui")) | {brand_name: .openfda.brand_name[0], generic_name: .openfda.generic_name[0], manufacturer: .openfda.manufacturer_name[0], rxcui: .openfda.rxcui[0]}' filename.json > newfile.json
This is a sample output:
{
"brand_name": "VELTIN",
"generic_name": "CLINDAMYCIN PHOSPHATE AND TRETINOIN",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
I need the output to be:
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate And Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
CodePudding user response:
Suppose we are given an array of words that are to be left as is, e.g.:
def exceptions: ["LLC", "USA"];
We can then define a capitalization function as follows:
# Capitalize all the words in the input string other than those specified by exceptions:
def capitalize:
INDEX(exceptions[]; .) as $e
| [splits("\\b") | select(length>0)]
| map(if $e[.] then . else (.[:1]|ascii_upcase) (.[1:] |ascii_downcase) end)
| join("");
For example, given "abc-DEF ghi USA" as input, the result would be "Abc-Def Ghi USA".
CodePudding user response:
Split at space characters to get an array of words, then split again at the empty string to get an array of characters. For the inner array, use ascii_downcase
on all elements but the first, then put all back together using add on the inner and join with a space character on the outer array.
(.brand_name, .generic_name) |= (
(. / " ") | map(. / "" | .[1:] |= map(ascii_downcase) | add) | join(" ")
)
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate And Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}
To ignore certain words from being processed, capture them with an if
condition:
map_values((. / " ") | map(
if IN("LLC", "AND") then .
else . / "" | .[1:] |= map(ascii_downcase) | add end
) | join(" "))
{
"brand_name": "Veltin",
"generic_name": "Clindamycin Phosphate AND Tretinoin",
"manufacturer": "Almirall, LLC",
"rxcui": "882548"
}