Sorry if you find the question confusing, let me know in the comments. I've a dataset as follows:
7897 Latimeria_chalumnae 102363860 GAS7B growth_arrest-specific_7b 317222 722675 - NW_005819646.1 XP_006000788.1
7897 Latimeria_chalumnae 106704378 LOC106704378 uncharacterized_LOC106704378 623183 624216 - NW_005819646.1 XP_014346745.1
7897 Latimeria_chalumnae 102364477 LOC102364477 long-chain-fatty-acid--CoA_ligase_5-like 757268 864711 - NW_005819646.1 XP_006000791.1|XP_014346746.1
7897 Latimeria_chalumnae 102364730 OGFOD1 2-oxoglutarate_and_iron-dependent_oxygenase_domain_containing_1 864809 920645 - NW_005819646.1 XP_006000792.1|XP_006000793.1|XP_014346747.1
7897 Latimeria_chalumnae 102365335 MYOCD myocardin 948800 1069037 - NW_005819646.1 XP_014346748.1
7897 Latimeria_chalumnae 102365586 CYSLTR1 cysteinyl_leukotriene_receptor_1 13426 72451 NW_005819647.1 XP_014346751.1|XP_014346752.1
7897 Latimeria_chalumnae 102365843 CHM CHM_Rab_escort_protein 113224 229988 - NW_005819647.1 XP_014346750.1
7897 Latimeria_chalumnae 102366099 DACHA dachshund_a 287169 757805 NW_005819647.1 XP_014346749.1
7897 Latimeria_chalumnae 102366352 HDX highly_divergent_homeobox 24978 164232 - NW_005819648.1 XP_014346753.1
7897 Latimeria_chalumnae 102366616 SI:CH211-26B3.4 connector_enhancer_of_kinase_suppressor_of_ras_2-like 237576 1034649 NW_005819648.1 XP_014346754.1
7897 Latimeria_chalumnae 102367334 EML4 EMAP_like_4 354683 728547 - NW_005819649.1 XP_006000801.1|XP_006000802.1|XP_006000804.1|XP_014346755.1|XP_014346756.1
9258 Ornithorhynchus_anatinus 103167024 IL13RA2 interleukin_13_receptor_subunit_alpha_2 401564 419415 - NC_041733.1 XP_028922892.1|XP_028922893.1
9258 Ornithorhynchus_anatinus 100091210 LRCH2 leucine_rich_repeats_and_calponin_homology_domain_containing_2 451057 499581 - NC_041733.1 XP_039768369.1
9258 Ornithorhynchus_anatinus 100089114 HTR2C 5-hydroxytryptamine_receptor_2C 512137 589504 - NC_041733.1 XP_039768370.1
9258 Ornithorhynchus_anatinus 100085912 TAF9B TATA-box_binding_protein_associated_factor_9b 655532 665427 - NC_041733.1 XP_028922939.1
9258 Ornithorhynchus_anatinus 100091680 LOC100091680 fibronectin_type-III_domain-containing_protein_3A-like 671994 709850 - NC_041733.1 XP_028923415.1
9258 Ornithorhynchus_anatinus 100680980 CYSLTR1 cysteinyl_leukotriene_receptor_1 722634 740718 - NC_041733.1 XP_028922864.1
9258 Ornithorhynchus_anatinus 114812808 LPAR4 lysophosphatidic_acid_receptor_4 846932 859671 NC_041733.1 XP_028922932.1
9258 Ornithorhynchus_anatinus 107547702 LOC107547702 putative_P2Y_purinoceptor_10 889304 899623 NC_041733.1 XP_028922895.1
9258 Ornithorhynchus_anatinus 100681472 LOC100681472 putative_P2Y_purinoceptor_10 903617 917352 NC_041733.1 XP_028922980.1
9258 Ornithorhynchus_anatinus 100092178 GPR174 G_protein-coupled_receptor_174 940005 966665 NC_041733.1 XP_028923950.1|XP_028923951.1
9258 Ornithorhynchus_anatinus 100084192 ITM2A integral_membrane_protein_2A 1003583 1026995 - NC_041733.1 XP_028923468.1
9785 Loxodonta_africana 100656219 GPR174 G_protein-coupled_receptor_174 21536967 21538915 - NW_003573444.1 XP_010591004.1
9785 Loxodonta_africana 100655837 LOC100655837 putative_P2Y_purinoceptor_10 21761599 21835744 - NW_003573444.1 XP_023407189.1|XP_023407190.1|XP_023407191.1|XP_023407192.1|XP_023407193.1|XP_023407194.1
9785 Loxodonta_africana 100656122 LOC100656122 putative_P2Y_purinoceptor_10 21857225 21873853 - NW_003573444.1 XP_010591005.1
9785 Loxodonta_africana 100656506 LPAR4 lysophosphatidic_acid_receptor_4 22272527 22286865 - NW_003573444.1 XP_003412756.1
9785 Loxodonta_africana 100656409 RTL3 retrotransposon_Gag_like_3 22458928 22461407 NW_003573444.1 XP_003412815.1
9785 Loxodonta_africana 111751378 CYSLTR1 cysteinyl_leukotriene_receptor_1 22732035 22790503 NW_003573444.1 XP_023407158.1
9785 Loxodonta_africana 100656793 TAF9B TATA-box_binding_protein_associated_factor_9b 22981954 22990102 NW_003573444.1 XP_003412757.1|XP_023407195.1|XP_023407196.1
9785 Loxodonta_africana 100657075 PGK1 phosphoglycerate_kinase_1 22994909 23016842 - NW_003573444.1 XP_003412758.1
9785 Loxodonta_africana 100656971 LOC100656971 toll-like_receptor_13 23113346 23150716 - NW_003573444.1 XP_010591056.1
9785 Loxodonta_africana 100657364 ATP7A ATPase_copper_transporting_alpha 23207390 23358207 - NW_003573444.1 AAG47425.1|XP_023407197.1
9785 Loxodonta_africana 100658215 LOC100658215 cytochrome_c_oxidase_subunit_7B,_mitochondrial 23421424 23425840 - NW_003573444.1 XP_003412760.1
I'm trying to add a series of consecutive number to column 6 and 7 based on the value of second column then substitute ' ' for '1' and '-' for '-1'. The output should look like:
7897 Latimeria_chalumnae 102363860 GAS7B growth_arrest-specific_7b 1 2 -1 NW_005819646.1 XP_006000788.1
7897 Latimeria_chalumnae 106704378 LOC106704378 uncharacterized_LOC106704378 2 3 -1 NW_005819646.1 XP_014346745.1
7897 Latimeria_chalumnae 102364477 LOC102364477 long-chain-fatty-acid--CoA_ligase_5-like 3 4 -1 NW_005819646.1 XP_006000791.1|XP_014346746.1
7897 Latimeria_chalumnae 102364730 OGFOD1 2-oxoglutarate_and_iron-dependent_oxygenase_domain_containing_1 4 5 -1 NW_005819646.1 XP_006000792.1|XP_006000793.1|XP_014346747.1
7897 Latimeria_chalumnae 102365335 MYOCD myocardin 5 6 -1 NW_005819646.1 XP_014346748.1
7897 Latimeria_chalumnae 102365586 CYSLTR1 cysteinyl_leukotriene_receptor_1 6 7 1 NW_005819647.1 XP_014346751.1|XP_014346752.1
7897 Latimeria_chalumnae 102365843 CHM CHM_Rab_escort_protein 7 8 -1 NW_005819647.1 XP_014346750.1
7897 Latimeria_chalumnae 102366099 DACHA dachshund_a 8 9 1 NW_005819647.1 XP_014346749.1
7897 Latimeria_chalumnae 102366352 HDX highly_divergent_homeobox 9 10 -1 NW_005819648.1 XP_014346753.1
7897 Latimeria_chalumnae 102366616 SI:CH211-26B3.4 connector_enhancer_of_kinase_suppressor_of_ras_2-like 10 11 1 NW_005819648.1 XP_014346754.1
7897 Latimeria_chalumnae 102367334 EML4 EMAP_like_4 11 12 -1 NW_005819649.1 XP_006000801.1|XP_006000802.1|XP_006000804.1|XP_014346755.1|XP_014346756.1
9258 Ornithorhynchus_anatinus 103167024 IL13RA2 interleukin_13_receptor_subunit_alpha_2 1 2 -1 NC_041733.1 XP_028922892.1|XP_028922893.1
9258 Ornithorhynchus_anatinus 100091210 LRCH2 leucine_rich_repeats_and_calponin_homology_domain_containing_2 2 3 -1 NC_041733.1 XP_039768369.1
9258 Ornithorhynchus_anatinus 100089114 HTR2C 5-hydroxytryptamine_receptor_2C 3 4 -1 NC_041733.1 XP_039768370.1
9258 Ornithorhynchus_anatinus 100085912 TAF9B TATA-box_binding_protein_associated_factor_9b 4 5 -1 NC_041733.1 XP_028922939.1
9258 Ornithorhynchus_anatinus 100091680 LOC100091680 fibronectin_type-III_domain-containing_protein_3A-like 5 6 -1 NC_041733.1 XP_028923415.1
9258 Ornithorhynchus_anatinus 100680980 CYSLTR1 cysteinyl_leukotriene_receptor_1 6 7 -1 NC_041733.1 XP_028922864.1
9258 Ornithorhynchus_anatinus 114812808 LPAR4 lysophosphatidic_acid_receptor_4 7 8 1 NC_041733.1 XP_028922932.1
9258 Ornithorhynchus_anatinus 100681472 LOC100681472 putative_P2Y_purinoceptor_10 8 9 1 NC_041733.1 XP_028922980.1
9258 Ornithorhynchus_anatinus 100092178 GPR174 G_protein-coupled_receptor_174 9 10 1 NC_041733.1 XP_028923950.1|XP_028923951.1
9785 Loxodonta_africana 100656219 GPR174 G_protein-coupled_receptor_174 1 2 -1 NW_003573444.1 XP_010591004.1
9785 Loxodonta_africana 100655837 LOC100655837 putative_P2Y_purinoceptor_10 2 3 -1 NW_003573444.1 XP_023407189.1|XP_023407190.1|XP_023407191.1|XP_023407192.1|XP_023407193.1|XP_023407194.1
9785 Loxodonta_africana 100656122 LOC100656122 putative_P2Y_purinoceptor_10 3 4 -1 NW_003573444.1 XP_010591005.1
9785 Loxodonta_africana 100656506 LPAR4 lysophosphatidic_acid_receptor_4 4 5 -1 NW_003573444.1 XP_003412756.1
9785 Loxodonta_africana 100656409 RTL3 retrotransposon_Gag_like_3 5 6 1 NW_003573444.1 XP_003412815.1
9785 Loxodonta_africana 111751378 CYSLTR1 cysteinyl_leukotriene_receptor_1 6 7 1 NW_003573444.1 XP_023407158.1
9785 Loxodonta_africana 100656793 TAF9B TATA-box_binding_protein_associated_factor_9b 7 8 1 NW_003573444.1 XP_003412757.1|XP_023407195.1|XP_023407196.1
9785 Loxodonta_africana 100657075 PGK1 phosphoglycerate_kinase_1 8 9 -1 NW_003573444.1 XP_003412758.1
9785 Loxodonta_africana 100656971 LOC100656971 toll-like_receptor_13 9 10 -1 NW_003573444.1 XP_010591056.1
9785 Loxodonta_africana 100657364 ATP7A ATPase_copper_transporting_alpha 10 11 -1 NW_003573444.1 AAG47425.1|XP_023407197.1
9785 Loxodonta_africana 100658215 LOC100658215 cytochrome_c_oxidase_subunit_7B,_mitochondrial 11 12 -1 NW_003573444.1 XP_003412760.1
The output of the concerned columns have changed as such:
- Consecutive number have been added on column 6 and 7 based on value of column 2.
- '-' have been substituted for '-1' and ' ' for '1'.
Thank you in advance!
CodePudding user response:
The simplest awk
program for that would be:
awk '
{
$6 = count[$2]
$7 = count[$2] 1
$8 = ($8 == " " ? 1 : -1)
print
}
' file.txt |
column -t
It squeezes the multiple occurrences of space characters but you can fix it with column -t
CodePudding user response:
sum the value of $6
and the square of $8
for value of $7
:
{m,g}awk 'BEGIN { __ = (_ = _)*_*_
} $(__-!!_) = ($(__-_) = ___[$_]) ($__ = (-!!_)^(" " < $__))^_' |
column -t