Home > Blockchain >  How to use IFS in bash for an array of strings?
How to use IFS in bash for an array of strings?

Time:04-03

Given an array of strings, I want IFS to parse each string by an underscore, join the first three separated values together using the same delimeter and output them to an empty list. For instance, if I have the array input=("L2Control_S3_L001_R1_001.fastq", "L2Control_S3_L001_R2_001.fastq") IFS would take the first string and separate it out by the underscore, so the output would be:

L2Control
S3
L001
R1
001.fastq

Afterward, it would take the first three separated values and join them together with an underscore: "L2Control_S3_L001". Lastly, this value would be appended onto a new array output=("L2Control_S3_L001") this process would continue until all values in the array are completed. I have tried the below implementation, but it seems to run infinitely.

#!/bin/bash

str=("L2Control_S3_L001_R1_001.fastq", "L2Control_S3_L001_R1_001.fastq")

IFS='_'
final=()

for (( c = 0; c = 2; c   )); do
  read -ra  SEPA <<< "${str[$c]}"
  final =("${SEPA[0]}_${SEPA[1]}_${SEPA[2]}")
done

Can someone help me with this, please?

CodePudding user response:

The ${array[*]} expansion joins elements using the first character of IFS. You can combine this with the ${var:offset:length} expansion.

output=()
for str in "${input[@]}"; do
    read -ra fields <<< "$str"
    output =("${fields[*]:0:3}")
done

I find declare -p varname ... handy to inspect the contents of variables.


This can also be done with bash parameter expansion:

str="L2Control_S3_L001_R1_001.fastq"
IFS=_
suffix=${str#*"$IFS"*"$IFS"*"$IFS"}
first3=${str%"$IFS$suffix"}
declare -p str suffix first3
declare -- str="L2Control_S3_L001_R1_001.fastq"
declare -- suffix="R1_001.fastq"
declare -- first3="L2Control_S3_L001"

Can also do that in one line, but it's hairy:

first3="${str%"$IFS${str#*"$IFS"*"$IFS"*"$IFS"}"}"

CodePudding user response:

Setup (notice no comma needed to separate array entries):

str=("L2Control_S3_L001_R1_001.fastq" "L2Control_S3_L001_R1_001.fastq")

One idea using a while/read loop to parse the input strings into 4 parts based on a delimiter (IFS=_):

final=()

while IFS=_ read -r f1 f2 f3 ignore
do
    final =("${f1}_${f2}_${f3}")
done < <(printf "%s\n" "${str[@]}")

typeset -p final

Where the variable ignore will be assigned fields #4-#n.

This generates:

declare -a final=([0]="L2Control_S3_L001" [1]="L2Control_S3_L001")

CodePudding user response:

That's a lengthy description of taking the first 3 elements separated by _.

printf "%s\n" "${str[@]}" | cut -d_ -f-3
  • Related