Home > database >  JQ Capitalize first letter of each word
JQ Capitalize first letter of each word

Time:06-15

I have a large JSON file that I am using JQ to pair down to only those elements I need. I have that working but there are some values that are string in all caps. Unfortunately, while jq has ascii_downcase and ascii_upcase, it does not have a built in function for uppercasing only the first letter of each word.

I need to only perform this on brand_name and generic_name, while ensure that the manufacturer name is also first letter capitalized with the exception of things like LLC which should remain capitalized.

Here's my current jq statement:

jq '.results[] | select(.openfda.brand_name != null or .openfda.generic_name != null or .openfda.rxcui != null) | select(.openfda|has("rxcui")) | {brand_name: .openfda.brand_name[0], generic_name: .openfda.generic_name[0], manufacturer: .openfda.manufacturer_name[0], rxcui: .openfda.rxcui[0]}' filename.json > newfile.json

This is a sample output:

{
  "brand_name": "VELTIN",
  "generic_name": "CLINDAMYCIN PHOSPHATE AND TRETINOIN",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

I need the output to be:

{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate And Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

CodePudding user response:

Suppose we are given an array of words that are to be left as is, e.g.:

def exceptions: ["LLC", "USA"];

We can then define a capitalization function as follows:

# Capitalize all the words in the input string other than those specified by exceptions:
def capitalize:
  INDEX(exceptions[]; .) as $e
  | [splits("\\b") | select(length>0)]
  | map(if $e[.] then . else (.[:1]|ascii_upcase)   (.[1:] |ascii_downcase) end)
  | join("");

For example, given "abc-DEF ghi USA" as input, the result would be "Abc-Def Ghi USA".

CodePudding user response:

Split at space characters to get an array of words, then split again at the empty string to get an array of characters. For the inner array, use ascii_downcase on all elements but the first, then put all back together using add on the inner and join with a space character on the outer array.

(.brand_name, .generic_name) |= (
  (. / " ") | map(. / "" | .[1:] |= map(ascii_downcase) | add) | join(" ")
)
{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate And Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

Demo


To ignore certain words from being processed, capture them with an if condition:

map_values((. / " ") | map(
  if IN("LLC", "AND") then .
  else . / "" | .[1:] |= map(ascii_downcase) | add end
) | join(" "))
{
  "brand_name": "Veltin",
  "generic_name": "Clindamycin Phosphate AND Tretinoin",
  "manufacturer": "Almirall, LLC",
  "rxcui": "882548"
}

Demo

  • Related