Concenate urls based on result of two columns-CodePudding

I would like to first take out of the string in the first column parenthesis which I can do with:

awk -F"[()]" '{print $2}'

Then, concatenate it with the second column to create a URL with the following format:

"https://ftp.drupal.org/files/projects/"[firstcolumn stripped out of parenthesis]-[secondcolumn].tar.gz

With input like:

Admin Toolbar (admin_toolbar)           8.x-2.5       
Entity Embed (entity_embed)             8.x-1.2       
Views Reference Field (viewsreference)  8.x-2.0-beta2 
Webform (webform)                       8.x-5.28

Data from the first line would create this URL:

https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz

CodePudding user response：

If a file a has your input, you can try this:

$ awk -F'[()]' '
  { 
    split($3,parts," *") 
    printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz\n", $2, parts[2]
  }' a 
https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz
https://ftp.drupal.org/files/projects/entity_embed-8.x-1.2.tar.gz
https://ftp.drupal.org/files/projects/viewsreference-8.x-2.0-beta2.tar.gz
https://ftp.drupal.org/files/projects/webform-8.x-5.28.tar.gz

The trick is to split the third field ($3). Based on your field separator ( -F'[()]'), the third field contains everything after the right paren. So, split can be used to get rid of all the spaces. I probably should have searched for an awk "trim" equivalent.

CodePudding user response：

Something like

sed 's!^[^(]*(\([^)]*\))[[:space:]]*\(.*\)!https://ftp.drupal.org/files/projects/\1-\2.tar.gz!' input.txt

CodePudding user response：

In the example data, the second last column seems to contain the part with the parenthesis that you are interested in, and the value of the last column.

If that is always the case, you can remove the parenthesis from the second last column, and concat the hyphen and the last column.

awk '{
gsub(/[()]/, "", $(NF-1))
printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz%s", $(NF-1), $NF, ORS
}' file

Output

https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz
https://ftp.drupal.org/files/projects/entity_embed-8.x-1.2.tar.gz
https://ftp.drupal.org/files/projects/viewsreference-8.x-2.0-beta2.tar.gz
https://ftp.drupal.org/files/projects/webform-8.x-5.28.tar.gz

Another option with a regex and gnu awk, using match and 2 capture groups to capture what is between the parenthesis and the next field.

awk 'match($0, /^[^()]*\(([^()] )\)\s (\S )/, ary) {
printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz%s", ary[1], ary[2], ORS
}' file