Home > Net >  How do I add a series of consecutive number to two columns based on a value of a particular column i
How do I add a series of consecutive number to two columns based on a value of a particular column i

Time:01-19

Sorry if you find the question confusing, let me know in the comments. I've a dataset as follows:

7897  Latimeria_chalumnae       102363860  GAS7B            growth_arrest-specific_7b                                        317222    722675    -  NW_005819646.1  XP_006000788.1
7897  Latimeria_chalumnae       106704378  LOC106704378     uncharacterized_LOC106704378                                     623183    624216    -  NW_005819646.1  XP_014346745.1
7897  Latimeria_chalumnae       102364477  LOC102364477     long-chain-fatty-acid--CoA_ligase_5-like                         757268    864711    -  NW_005819646.1  XP_006000791.1|XP_014346746.1
7897  Latimeria_chalumnae       102364730  OGFOD1           2-oxoglutarate_and_iron-dependent_oxygenase_domain_containing_1  864809    920645    -  NW_005819646.1  XP_006000792.1|XP_006000793.1|XP_014346747.1
7897  Latimeria_chalumnae       102365335  MYOCD            myocardin                                                        948800    1069037   -  NW_005819646.1  XP_014346748.1
7897  Latimeria_chalumnae       102365586  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 13426     72451        NW_005819647.1  XP_014346751.1|XP_014346752.1
7897  Latimeria_chalumnae       102365843  CHM              CHM_Rab_escort_protein                                           113224    229988    -  NW_005819647.1  XP_014346750.1
7897  Latimeria_chalumnae       102366099  DACHA            dachshund_a                                                      287169    757805       NW_005819647.1  XP_014346749.1
7897  Latimeria_chalumnae       102366352  HDX              highly_divergent_homeobox                                        24978     164232    -  NW_005819648.1  XP_014346753.1
7897  Latimeria_chalumnae       102366616  SI:CH211-26B3.4  connector_enhancer_of_kinase_suppressor_of_ras_2-like            237576    1034649      NW_005819648.1  XP_014346754.1
7897  Latimeria_chalumnae       102367334  EML4             EMAP_like_4                                                      354683    728547    -  NW_005819649.1  XP_006000801.1|XP_006000802.1|XP_006000804.1|XP_014346755.1|XP_014346756.1
9258  Ornithorhynchus_anatinus  103167024  IL13RA2          interleukin_13_receptor_subunit_alpha_2                          401564    419415    -  NC_041733.1     XP_028922892.1|XP_028922893.1
9258  Ornithorhynchus_anatinus  100091210  LRCH2            leucine_rich_repeats_and_calponin_homology_domain_containing_2   451057    499581    -  NC_041733.1     XP_039768369.1
9258  Ornithorhynchus_anatinus  100089114  HTR2C            5-hydroxytryptamine_receptor_2C                                  512137    589504    -  NC_041733.1     XP_039768370.1
9258  Ornithorhynchus_anatinus  100085912  TAF9B            TATA-box_binding_protein_associated_factor_9b                    655532    665427    -  NC_041733.1     XP_028922939.1
9258  Ornithorhynchus_anatinus  100091680  LOC100091680     fibronectin_type-III_domain-containing_protein_3A-like           671994    709850    -  NC_041733.1     XP_028923415.1
9258  Ornithorhynchus_anatinus  100680980  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 722634    740718    -  NC_041733.1     XP_028922864.1
9258  Ornithorhynchus_anatinus  114812808  LPAR4            lysophosphatidic_acid_receptor_4                                 846932    859671       NC_041733.1     XP_028922932.1
9258  Ornithorhynchus_anatinus  107547702  LOC107547702     putative_P2Y_purinoceptor_10                                     889304    899623       NC_041733.1     XP_028922895.1
9258  Ornithorhynchus_anatinus  100681472  LOC100681472     putative_P2Y_purinoceptor_10                                     903617    917352       NC_041733.1     XP_028922980.1
9258  Ornithorhynchus_anatinus  100092178  GPR174           G_protein-coupled_receptor_174                                   940005    966665       NC_041733.1     XP_028923950.1|XP_028923951.1
9258  Ornithorhynchus_anatinus  100084192  ITM2A            integral_membrane_protein_2A                                     1003583   1026995   -  NC_041733.1     XP_028923468.1
9785  Loxodonta_africana        100656219  GPR174           G_protein-coupled_receptor_174                                   21536967  21538915  -  NW_003573444.1  XP_010591004.1
9785  Loxodonta_africana        100655837  LOC100655837     putative_P2Y_purinoceptor_10                                     21761599  21835744  -  NW_003573444.1  XP_023407189.1|XP_023407190.1|XP_023407191.1|XP_023407192.1|XP_023407193.1|XP_023407194.1
9785  Loxodonta_africana        100656122  LOC100656122     putative_P2Y_purinoceptor_10                                     21857225  21873853  -  NW_003573444.1  XP_010591005.1
9785  Loxodonta_africana        100656506  LPAR4            lysophosphatidic_acid_receptor_4                                 22272527  22286865  -  NW_003573444.1  XP_003412756.1
9785  Loxodonta_africana        100656409  RTL3             retrotransposon_Gag_like_3                                       22458928  22461407     NW_003573444.1  XP_003412815.1
9785  Loxodonta_africana        111751378  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 22732035  22790503     NW_003573444.1  XP_023407158.1
9785  Loxodonta_africana        100656793  TAF9B            TATA-box_binding_protein_associated_factor_9b                    22981954  22990102     NW_003573444.1  XP_003412757.1|XP_023407195.1|XP_023407196.1
9785  Loxodonta_africana        100657075  PGK1             phosphoglycerate_kinase_1                                        22994909  23016842  -  NW_003573444.1  XP_003412758.1
9785  Loxodonta_africana        100656971  LOC100656971     toll-like_receptor_13                                            23113346  23150716  -  NW_003573444.1  XP_010591056.1
9785  Loxodonta_africana        100657364  ATP7A            ATPase_copper_transporting_alpha                                 23207390  23358207  -  NW_003573444.1  AAG47425.1|XP_023407197.1
9785  Loxodonta_africana        100658215  LOC100658215     cytochrome_c_oxidase_subunit_7B,_mitochondrial                   23421424  23425840  -  NW_003573444.1  XP_003412760.1

I'm trying to add a series of consecutive number to column 6 and 7 based on the value of second column then substitute ' ' for '1' and '-' for '-1'. The output should look like:

7897  Latimeria_chalumnae       102363860  GAS7B            growth_arrest-specific_7b                                        1  2   -1  NW_005819646.1  XP_006000788.1
7897  Latimeria_chalumnae       106704378  LOC106704378     uncharacterized_LOC106704378                                     2  3   -1  NW_005819646.1  XP_014346745.1
7897  Latimeria_chalumnae       102364477  LOC102364477     long-chain-fatty-acid--CoA_ligase_5-like                         3  4   -1  NW_005819646.1  XP_006000791.1|XP_014346746.1
7897  Latimeria_chalumnae       102364730  OGFOD1           2-oxoglutarate_and_iron-dependent_oxygenase_domain_containing_1  4  5   -1  NW_005819646.1  XP_006000792.1|XP_006000793.1|XP_014346747.1
7897  Latimeria_chalumnae       102365335  MYOCD            myocardin                                                        5  6   -1  NW_005819646.1  XP_014346748.1
7897  Latimeria_chalumnae       102365586  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 6  7   1  NW_005819647.1  XP_014346751.1|XP_014346752.1
7897  Latimeria_chalumnae       102365843  CHM              CHM_Rab_escort_protein                                           7  8   -1  NW_005819647.1  XP_014346750.1
7897  Latimeria_chalumnae       102366099  DACHA            dachshund_a                                                      8  9   1  NW_005819647.1  XP_014346749.1
7897  Latimeria_chalumnae       102366352  HDX              highly_divergent_homeobox                                        9  10   -1  NW_005819648.1  XP_014346753.1
7897  Latimeria_chalumnae       102366616  SI:CH211-26B3.4  connector_enhancer_of_kinase_suppressor_of_ras_2-like            10 11   1  NW_005819648.1  XP_014346754.1
7897  Latimeria_chalumnae       102367334  EML4             EMAP_like_4                                                      11 12   -1  NW_005819649.1  XP_006000801.1|XP_006000802.1|XP_006000804.1|XP_014346755.1|XP_014346756.1
9258  Ornithorhynchus_anatinus  103167024  IL13RA2          interleukin_13_receptor_subunit_alpha_2                          1  2  -1  NC_041733.1     XP_028922892.1|XP_028922893.1
9258  Ornithorhynchus_anatinus  100091210  LRCH2            leucine_rich_repeats_and_calponin_homology_domain_containing_2   2  3  -1  NC_041733.1     XP_039768369.1
9258  Ornithorhynchus_anatinus  100089114  HTR2C            5-hydroxytryptamine_receptor_2C                                  3  4  -1  NC_041733.1     XP_039768370.1
9258  Ornithorhynchus_anatinus  100085912  TAF9B            TATA-box_binding_protein_associated_factor_9b                    4  5  -1  NC_041733.1     XP_028922939.1
9258  Ornithorhynchus_anatinus  100091680  LOC100091680     fibronectin_type-III_domain-containing_protein_3A-like           5  6  -1  NC_041733.1     XP_028923415.1
9258  Ornithorhynchus_anatinus  100680980  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 6  7  -1  NC_041733.1     XP_028922864.1
9258  Ornithorhynchus_anatinus  114812808  LPAR4            lysophosphatidic_acid_receptor_4                                 7  8  1  NC_041733.1     XP_028922932.1
9258  Ornithorhynchus_anatinus  100681472  LOC100681472     putative_P2Y_purinoceptor_10                                     8  9  1  NC_041733.1     XP_028922980.1
9258  Ornithorhynchus_anatinus  100092178  GPR174           G_protein-coupled_receptor_174                                   9  10  1  NC_041733.1     XP_028923950.1|XP_028923951.1
9785  Loxodonta_africana        100656219  GPR174           G_protein-coupled_receptor_174                                   1  2   -1  NW_003573444.1  XP_010591004.1
9785  Loxodonta_africana        100655837  LOC100655837     putative_P2Y_purinoceptor_10                                     2  3   -1  NW_003573444.1  XP_023407189.1|XP_023407190.1|XP_023407191.1|XP_023407192.1|XP_023407193.1|XP_023407194.1
9785  Loxodonta_africana        100656122  LOC100656122     putative_P2Y_purinoceptor_10                                     3  4   -1  NW_003573444.1  XP_010591005.1
9785  Loxodonta_africana        100656506  LPAR4            lysophosphatidic_acid_receptor_4                                 4  5   -1  NW_003573444.1  XP_003412756.1
9785  Loxodonta_africana        100656409  RTL3             retrotransposon_Gag_like_3                                       5  6   1  NW_003573444.1  XP_003412815.1
9785  Loxodonta_africana        111751378  CYSLTR1          cysteinyl_leukotriene_receptor_1                                 6  7   1  NW_003573444.1  XP_023407158.1
9785  Loxodonta_africana        100656793  TAF9B            TATA-box_binding_protein_associated_factor_9b                    7  8   1  NW_003573444.1  XP_003412757.1|XP_023407195.1|XP_023407196.1
9785  Loxodonta_africana        100657075  PGK1             phosphoglycerate_kinase_1                                        8  9   -1  NW_003573444.1  XP_003412758.1
9785  Loxodonta_africana        100656971  LOC100656971     toll-like_receptor_13                                            9  10  -1  NW_003573444.1  XP_010591056.1
9785  Loxodonta_africana        100657364  ATP7A            ATPase_copper_transporting_alpha                                 10 11  -1  NW_003573444.1  AAG47425.1|XP_023407197.1
9785  Loxodonta_africana        100658215  LOC100658215     cytochrome_c_oxidase_subunit_7B,_mitochondrial                   11 12  -1  NW_003573444.1  XP_003412760.1

The output of the concerned columns have changed as such:

  1. Consecutive number have been added on column 6 and 7 based on value of column 2.
  2. '-' have been substituted for '-1' and ' ' for '1'.

Thank you in advance!

CodePudding user response:

The simplest awk program for that would be:

awk '
    {
        $6 =   count[$2]
        $7 = count[$2] 1
        $8 = ($8 == " " ? 1 : -1)
        print
    }
' file.txt |
column -t

It squeezes the multiple occurrences of space characters but you can fix it with column -t

CodePudding user response:

sum the value of $6 and the square of $8 for value of $7 :

{m,g}awk 'BEGIN {  __ = (_ =  _)*_*_ 
          } $(__-!!_) = ($(__-_) =   ___[$_])   ($__ = (-!!_)^(" " < $__))^_' |

column -t 
  • Related