Home > Net >  Removing text after a certain number of letter and before a specific character
Removing text after a certain number of letter and before a specific character

Time:11-06

I have a file that looks like this:

xxxxxxx-000-387-159 < 50 null null_27 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 < 50 < 50 100 - 500_0 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 50 - 100 50 - 100 100 - 500_31 KCNQ1 3 1.326 0.848 0.479 1.557

I need to remove anything (spaces and character) after the first 19th characters and before letter K in KCNQ1. The expected output would be:

xxxxxxx-000-387-159 KCNQ1 1 1.430 0.773 0.656 1.724

xxxxxxx-000-375-285 KCNQ1 2 1.409 0.786 0.623 1.801

xxxxxxx-000-523-531 KCNQ1 3 1.326 0.848 0.479 1.557

I tried with sed:

sed -e ’s/...................KCNQ/ KCNQ/g’ file.rpt > file_new.rpt

but it is only changing the first line.

What am I doing wrong?

CodePudding user response:

Assuming there is no K between the 19 first chars and KCNQ, you can use

sed -E 's/^(.{19})[^K]*/\1/' file
# If you need to make sure there is `KCNQ` string on the right:
sed -E 's/^(.{19})[^K]*(KCNQ)/\1 \2/' file

See an online demo:

#!/bin/bash
s='xxxxxxx-000-387-159 < 50 null null_27 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 < 50 < 50 100 - 500_0 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 50 - 100 50 - 100 100 - 500_31 KCNQ1 3 1.326 0.848 0.479 1.557'
sed -E 's/^(.{19})[^K]*(KCNQ)/\1 \2/' <<< "$s"

Output:

xxxxxxx-000-387-159 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 KCNQ1 3 1.326 0.848 0.479 1.557

Also, if there is just one KCNQ in the string, you may also use

sed -E 's/^(.{19}).*(KCNQ)/\1 \2/' file

where [^K]* is replaced with .*.

CodePudding user response:

It will be very simple in awk, with your shown samples, please try following code.

awk '{print substr($0,1,18),substr($0,index($0,"K"))}'  Input_file

Explanation: Simple explanation would be, printing sub-strings in each line. Now sub string is based on OP's requirement(from which position to which position we need to print). Firstly printing 1st to 18th character in current line and then getting index(position value of K letter) and printing from that position onwards to till last of each current line in this code.

  • Related