Home > database >  Unix: Subsetting data - Selecting rows based on (multiple) values in a column
Unix: Subsetting data - Selecting rows based on (multiple) values in a column

Time:10-18

I have a .tsv file that I would like to filter in Unix .

I want to select the rows that have certain numerical values (e.g 30700, 10600, ... etc) in a particular column.

Thus far, I have seen examples online where rows have been selected based on one particular value in a column. However, in my case, a particular column can have about 20-30 accepted values. How do I go about the subsetting of my data in this case?

CodePudding user response:

awk '{ if ($1 == 1 || $1 == 2) print $0; }'

would do the trick; but nobody gets promoted for writing 40 term if statements; so you might like to consider:

BEGIN { a[1] = a[2] = 1; }
{ if (a[$1]) print $0; }

as a template. Nice thing about awk; it is such a flexible language that there are probably dozens of different ways to approach this. The difficult thing about awk; it is such a flexible language that there are probably dozens of different ways to approach this.

  • Related