Home > Software engineering >  If value of a column equals value of same column in previous line plus one, give the same code
If value of a column equals value of same column in previous line plus one, give the same code

Time:10-06

I have some data that looks like this:

chr1    3861154 N   20
chr1    3861155 N   20
chr1    3861156 N   20
chr1    3949989 N   22
chr1    3949990 N   22
chr1    3949991 N   22

What I need to do is to give a code based on column 2. If the value equals the value of previous line plus one, then they come from the same series and I need to give them the same code in a new column. That code could be the value of the first line of that series. The desired output for this example would be:


chr1    3861154 N   20  3861154
chr1    3861155 N   20  3861154
chr1    3861156 N   20  3861154
chr1    3949989 N   22  3949989
chr1    3949990 N   22  3949989
chr1    3949991 N   22  3949989

I was thinking of using awk, but of course that's not a requirement. Any ideas of how could I make this work?

Edit to add the code I'm working in:

awk 'BEGIN {var = $2} {if ($2 == var 1) print $0"\t"var; else print $0"\t"$2; var = $2 }' test

I think the idea is there, but it's not quite right yet. The result I'm getting is:

chr1    3861154 N   20  3861154
chr1    3861155 N   20  3861154
chr1    3861156 N   20  3861155
chr1    3949989 N   22  3949989
chr1    3949990 N   22  3949989
chr1    3949991 N   22  3949990

Thanks!

CodePudding user response:

$ cat tst.awk
(NR == 1) || ($2 != (prev 1)) {
    val = $2
}
{
    print $0, val
    prev = $2
}

$ awk -f tst.awk file
chr1    3861154 N   20 3861154
chr1    3861155 N   20 3861154
chr1    3861156 N   20 3861154
chr1    3949989 N   22 3949989
chr1    3949990 N   22 3949989
chr1    3949991 N   22 3949989

The big mistake in your script was this part:

BEGIN {var = $2}

because:

  • $2 is the 2nd field of the current line of input.
  • BEGIN is executed before any input lines have been read.

So the value of $2 in the BEGIN section is zero-or-null just like any other unset variable.

  • Related