I have this text:
Be sure to see if the code requires firestops in
the stud walls.* These are wood strips between
the studs that will prevent flames and hot air
from moving upward within the wall. (Some
areas require that firestopping be placed no more
than one story apart. In platform-frame con-
struction, the floor platform acts as an adequate
divider and no firestopping is required.) Fire-
stopping will be required in most cases between
the joists at the places where they are supported.
These solid wooden bridges prevent the hori-
zontal movement of fire and hot gases within the
floor.
The 1970’s provided a critical turning point for
energy consumption in this country and for
other major energy-consuming countries. With
the Arab oil embargo, prices rose at an outra-
geous rate, creating a scarcity of gasoline and
heating oil. While it is still debated whether the
crisis was legitimate or created to inflate crude-
oil prices, there were lessons to be learned from
the fuel shortage. First, fuel oil is a limited and
irreplaceable resource. Second, the Western
world is burning oil at an unprecedented and
wasteful rate. The remedy is to conserve fuel as
much as possible and to explore and discover
new, regenerative sources of energy such as
solar power.
I would like to make a paragraph be on a single line instead of multiple lines. So the output will be this:
Be sure to see if the code requires firestops in the stud walls.* These are wood strips between the studs that will prevent flames and hot air from moving upward within the wall. (Some areas require that firestopping be placed no more than one story apart. In platform-frame con- struction, the floor platform acts as an adequate divider and no firestopping is required.) Fire- stopping will be required in most cases between the joists at the places where they are supported. These solid wooden bridges prevent the hori-zontal movement of fire and hot gases within the floor.
The 1970’s provided a critical turning point for energy consumption in this country and for other major energy-consuming countries. With the Arab oil embargo, prices rose at an outra-geous rate, creating a scarcity of gasoline and heating oil. While it is still debated whether the crisis was legitimate or created to inflate crude-oil prices, there were lessons to be learned from the fuel shortage. First, fuel oil is a limited and irreplaceable resource. Second, the Western world is burning oil at an unprecedented and wasteful rate. The remedy is to conserve fuel as much as possible and to explore and discover new, regenerative sources of energy such as solar power.
I'm seeing there's thing like sed and awk, but I'm not too sure how either one works, so far they seem alien to me.
Thank you for reading and helping if you can.
So far I only do this manually, but I honestly do not know how to make this work as I haven't found yet a solution for similar problem.
CodePudding user response:
Try something like:
export UNUSED_CHAR="@" # Pick a delimiting character that doesn't exist in the text
tr "\n" "${UNUSED_CHAR}" < filename `# replace all newlines with the delimiter` \
| sed "s/${UNUSED_CHAR}${UNUSED_CHAR}/\n\n/g" `# replace consecutive delimiters with 2 newlines` \
| sed "s/${UNUSED_CHAR}-//g" `# combine hyphentated words` \
| sed "s/${UNUSED_CHAR}/ /g" `# replace remaining instances of delimiter with a single space` \
> new_filename # Write results to a new file
CodePudding user response:
Simple awk
solution
This solution works by setting the awk
record
delimeter (RS
) to an empty string (which causes the file to be read in records separated by blank (or white-space-only lines).
(assuming your text is in the file paragraphs.txt)
awk 'BEGIN{RS="";} //{print "\n"} {$1=$1;printf $0}' paragraphs.txt
The record
separator is set in the BEGIN
block. The //
block tells awk to print a new-line if a blank line is encountered (as we need to separate your paragraphs). The main block uses the trick of setting any field to itself to cause awk to reform the whole record ($0) with just what it sees (i.e. ignoring line breaks).
CodePudding user response:
In addition to simply joining the lines in each paragraph, you need to handle trailing hyphens '-'
and ensure no space is appended following the hyphen. You can do that by simply building each line, appending the contents of each line using a ternary to check whether the last
character in the preceding line was '-'
.
A short implementation would be:
awk '
NF==0 { print str "\n"; str="" }
{ str=str (str && last!="-" ? " " : "") $0 }
{ last=substr($0,length($0),1) }
END { print str }
' file
Example Output
With your example contents in file
the result is:
Be sure to see if the code requires firestops in the stud walls.* These are wood strips between the studs that will prevent flames and hot air from moving upward within the wall. (Some areas require that firestopping be placed no more than one story apart. In platform-frame con-struction, the floor platform acts as an adequate divider and no firestopping is required.) Fire-stopping will be required in most cases between the joists at the places where they are supported. These solid wooden bridges prevent the hori-zontal movement of fire and hot gases within the floor.
The 1970’s provided a critical turning point for energy consumption in this country and for other major energy-consuming countries. With the Arab oil embargo, prices rose at an outra-geous rate, creating a scarcity of gasoline and heating oil. While it is still debated whether the crisis was legitimate or created to inflate crude-oil prices, there were lessons to be learned from the fuel shortage. First, fuel oil is a limited and irreplaceable resource. Second, the Western world is burning oil at an unprecedented and wasteful rate. The remedy is to conserve fuel as much as possible and to explore and discover new, regenerative sources of energy such as solar power.
NOTE: without a dictionary lookup, there is no way to distinguish between "hori-zontal"
which should have the hyphen removed and, e.g. "crude-oil"
which is properly hyphenated when the hyphen appears at the end of the line.
CodePudding user response:
Using GNU sed
$ sed -E ':a;N;s/\n([[:alpha:]])/ \1/;ba' input_file
e sure to see if the code requires firestops in the stud walls.* These are wood strips between the studs that will prevent flames and hot air from moving upward within the wall. (Some areas require that firestopping be placed no more than one story apart. In platform-frame con- struction, the floor platform acts as an adequate divider and no firestopping is required.) Fire- stopping will be required in most cases between the joists at the places where they are supported. These solid wooden bridges prevent the hori- zontal movement of fire and hot gases within the floor.
The 1970’s provided a critical turning point for energy consumption in this country and for other major energy-consuming countries. With the Arab oil embargo, prices rose at an outra- geous rate, creating a scarcity of gasoline and heating oil. While it is still debated whether the crisis was legitimate or created to inflate crude- oil prices, there were lessons to be learned from the fuel shortage. First, fuel oil is a limited and irreplaceable resource. Second, the Western world is burning oil at an unprecedented and wasteful rate. The remedy is to conserve fuel as much as possible and to explore and discover new, regenerative sources of energy such as solar power.
For the expected output, you could also try
$ sed -E '/^$/s/^/###/' input_file | sed -Ez ':a;s/\n/ /;ta;s/# /\n\n/;s/$/\n/'
e sure to see if the code requires firestops in the stud walls.* These are wood strips between the studs that will prevent flames and hot air from moving upward within the wall. (Some areas require that firestopping be placed no more than one story apart. In platform-frame con- struction, the floor platform acts as an adequate divider and no firestopping is required.) Fire- stopping will be required in most cases between the joists at the places where they are supported. These solid wooden bridges prevent the hori- zontal movement of fire and hot gases within the floor.
The 1970’s provided a critical turning point for energy consumption in this country and for other major energy-consuming countries. With the Arab oil embargo, prices rose at an outra- geous rate, creating a scarcity of gasoline and heating oil. While it is still debated whether the crisis was legitimate or created to inflate crude- oil prices, there were lessons to be learned from the fuel shortage. First, fuel oil is a limited and irreplaceable resource. Second, the Western world is burning oil at an unprecedented and wasteful rate. The remedy is to conserve fuel as much as possible and to explore and discover new, regenerative sources of energy such as solar power.