I have one file suffix.txt
which contains some strings linewise, for example-
ing
ness
es
ed
tion
Also, I have a text file text.txt
which contains some text,
it is given that text.txt
consists only of lowercase letters and without any punctuation, for example-
the raining cloud answered the man all his interrogation and with all
questioned mind the princess responded
harness all goodness without getting irritated
I want to remove the suffixes from the original words in text.txt
only once for every suffix. Thus I expect the following output-
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
Note that tion
was not removed from questioned
since the original word didn't contain tion
as a suffix. It would be really helpful if someone could answer this with sed
commands.
I was using a naive script that doesn't seem to do the job-
#!/bin/bash
while read p; do
sed -i "s/$p / /g" text.txt;
sed -i "s/$p$//g" text.txt;
done <suffix.txt
CodePudding user response:
An awk:
$ awk '
NR==FNR { # generate a regex of suffices
s=s (s==""?"(":"|") $0 # (ing|ness|es|ed|tion)$
next
}
FNR==1 {
s=s ")$" # well, above )$ is inserted here
}
{
for(i=1;i<=NF;i ) # iterate all the words and
sub(s,"",$i) # apply regex to each of them
}1' suffix text # output
Output:
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
CodePudding user response:
Kinda hairy but sed and unix tools only:
sed -E -f <(tr '\n' '|' <suffix.txt | sed 's/\|$//; s/\|/\\\\b|/g; s/$/\\\\b/' | xargs printf 's/%s//g') text.txt
The
tr '\n' '|' <suffix.txt | sed 's/\|$//; s/\|/\\\\b|/g; s/$/\\\\b/' | xargs printf 's/%s//g'
generates the substitution script of
s/ing\b|ness\b|es\b|ed\b|tion\b//g
This requires GNU sed for \b
.
It would be easier with perl, ruby, awk, etc
Here is a GNU awk:
gawk -i join 'FNR==NR {arr[FNR]=$1; next}
FNR==1{re=join(arr,1,length(arr),"\\>|"); re=re "\\>"}
{gsub(re,"")}
1
' suffix.txt text.txt
Both produce:
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
CodePudding user response:
This might work for you (GNU sed):
sed -z 'y/\n/|/;s/|$//;s#.*#s/\\B(&)\\b//g#' suffixFile | sed -Ef - textFile
Convert suffixFile into sed commands in a file and pass that via a pipe to a second invocation of sed that amends the textFile.
N.B. The sed command use the \B
and \b
to match a suffix.
CodePudding user response:
You can try this sed
approach.
You will first need to create an array from suffix.txt
suffix=($(cat suffix.txt))
You can then use it for ubstitution within the main sed
code.
sed " s/${suffix[0]}//;s/${suffix[1]}//g;/question/! {s/${suffix[2]//};s/${suffix[3]}//g;/question/! {s/${suffix[4]}//}" text.txt
Output
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat