I have a markdown text document with several sections and just below hashtags of the section. The hashtags are in the form #oneword#
or #multiple words hashtag#
.
I need to extract sections and their hashtags in ruby.
Example
# Section 1
#hash1# #hash tag 2# #hashtag3#
Some text
# Section 2
#hash1# #hash tag 4# #hash tag2#
Some text too
I want to get
{"Section 1"=>["#hash1#", "#hash tag 2#", "#hashtag3#"],
"Section 2"=>["#hash1#", "#hash tag 4#", "#hash tag2#"]}
Can we get in from grep?
CodePudding user response:
My example being:
# Section 1
#hash1# #hash tag 2# #hashtag3#
#more hashes# #and more hashes#
only a # FakeSection
Some text
# Section 2
#hash1# #hash tag 4# #hash tag2#
Some text too
and this code (ruby 3.1.2p20):
SECTION_REGEX = /^#[^#]*$/
HASH_REGEX = /#[^#]*#/
text = #...
# Iteration section key
key = nil
# Loop all the lines in the text
result = text.split("\n").each_with_object({}) do |line, memo|
# If matches a section, set the section as the key for your result
next key = line.delete('#').strip if line.match?(SECTION_REGEX)
# If there is still no section to append hashes, skip until there is
next if key.nil?
# If code reaches this line, it means it is a line between sections
# Matches the regex groups you need and returns them to a array
matches = line.scan(HASH_REGEX)
# Concats it to an array
(memo[key] ||= []).concat(matches)
end
The following result is
{
"Section 1"=>["#hash1#", "#hash tag 2#", "#hashtag3#", "#more hashes#", "#and more hashes#"],
"Section 2"=>["#hash1#", "#hash tag 4#", "#hash tag2#"]
}
Just be careful, I created this regex myself, so it might have unexpected behaviour for other markdown tags (since I didn't think of them while making this code), but seems to work fine with your example
Hope it helps!!
CodePudding user response:
When faced with a problem such as this I tend to prefer the to use the builder pattern. It is a little verbose, but is normally very readable and very flexible.
The main idea is you have a "reader" that simply looks at your input and looks for "tokens', in this case lines, and when it finds a token that it recognizes it informs the builder that it found a token of interest. The builder builds another object based on input from the "reader". Here is an example of a "DocumentBuilder" that takes input from a "MarkdownReader" that builds the Hash that you are looking for.
class MarkdownReader
attr_reader :builder
def initialize(builder)
@builder = builder
end
def parse(lines)
lines.each do |line|
case line
when /^#[^#] $/
builder.convert_section(line)
when /^#. \#$/
builder.convert_hashtag(line)
end
end
end
end
class DocumentBuilder
attr_reader :document
def initialize()
@document = {}
end
def convert_section(line)
line =~ /^#(. )$/
@section_name = $1
document[@section_name] = []
end
def convert_hashtag(line)
hashtags = line.split("#").reject {_1.strip.empty?}
document[@section_name] = hashtags
end
end
lines = File.readlines("markdown.md")
builder = DocumentBuilder.new
reader = MarkdownReader.new(builder)
reader.parse(lines)
p builder.document
=> {" Section 1"=>["hash1", "hash tag 2", "hashtag3"], " Section 2"=>["hash1", "hash tag 4", "hash tag2"]}