I have a file that looks like this:
xxxxxxx-000-387-159 < 50 null null_27 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 < 50 < 50 100 - 500_0 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 50 - 100 50 - 100 100 - 500_31 KCNQ1 3 1.326 0.848 0.479 1.557
I need to remove anything (spaces and character) after the first 19th characters and before letter K in KCNQ1. The expected output would be:
xxxxxxx-000-387-159 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 KCNQ1 3 1.326 0.848 0.479 1.557
I tried with sed:
sed -e ’s/...................KCNQ/ KCNQ/g’ file.rpt > file_new.rpt
but it is only changing the first line.
What am I doing wrong?
CodePudding user response:
Assuming there is no K
between the 19 first chars and KCNQ
, you can use
sed -E 's/^(.{19})[^K]*/\1/' file
# If you need to make sure there is `KCNQ` string on the right:
sed -E 's/^(.{19})[^K]*(KCNQ)/\1 \2/' file
See an online demo:
#!/bin/bash
s='xxxxxxx-000-387-159 < 50 null null_27 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 < 50 < 50 100 - 500_0 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 50 - 100 50 - 100 100 - 500_31 KCNQ1 3 1.326 0.848 0.479 1.557'
sed -E 's/^(.{19})[^K]*(KCNQ)/\1 \2/' <<< "$s"
Output:
xxxxxxx-000-387-159 KCNQ1 1 1.430 0.773 0.656 1.724
xxxxxxx-000-375-285 KCNQ1 2 1.409 0.786 0.623 1.801
xxxxxxx-000-523-531 KCNQ1 3 1.326 0.848 0.479 1.557
Also, if there is just one KCNQ
in the string, you may also use
sed -E 's/^(.{19}).*(KCNQ)/\1 \2/' file
where [^K]*
is replaced with .*
.
CodePudding user response:
It will be very simple in awk
, with your shown samples, please try following code.
awk '{print substr($0,1,18),substr($0,index($0,"K"))}' Input_file
Explanation: Simple explanation would be, printing sub-strings in each line. Now sub string is based on OP's requirement(from which position to which position we need to print). Firstly printing 1st to 18th character in current line and then getting index(position value of K
letter) and printing from that position onwards to till last of each current line in this code.