From a GitHub Markdown header
# Söme/title-header_
GitHub's renderer creates the anchor
#sömetitle-header_
Apparently, spaces and /
are removed, letters (ASCII and Unicode) are lowercased, and -
and _
are preserved.
Is this correct; are there other rules?
CodePudding user response:
GitHub.com's process for converting Markdown heading text to id=""
attributes for automatic #fragment
links is not defined by any of the Markdown specifications nor implementations.
For example, it isn't described in the GitHub Flavored Markdown Spec.
Instead, it's something that GitHub do themselves privately after initial conversion from Markdown to HTML is completed, this is described in Step 4 in GitHub's own readme file on the topic (emphasis mine):
This library is the first step of a journey that every markup file in a repository goes on before it is rendered on GitHub.com:
github-markup
selects an underlying library to convert the raw markup to HTML.- The HTML is sanitized, aggressively removing things that could harm you and your kin—such as script tags, inline-styles, and class or id attributes.
- Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
- The HTML is passed through other filters that add special sauce, such as emoji, task lists, named anchors, CDN caching for images, and autolinking.
- The resulting HTML is rendered on GitHub.com.
.md
/ Markdown files are processed by CommonMarker libcmark, which does not include id=""
attribute and #fragment
URI generation as a built-in feature, but CommonMarker's documentation actually provides a sample implementation of Markdown header id=""
attributes for #fragment
links on the front-page, repeated below:
class MyHtmlRenderer < CommonMarker::HtmlRenderer
def initialize
super
@headerid = 1
end
def header(node)
block do
out("<h", node.header_level, " id=\"", @headerid, "\">",
:children, "</h", node.header_level, ">")
@headerid = 1
end
end
end
# this renderer prints directly to STDOUT, instead
# of returning a string
myrenderer = MyHtmlRenderer.new
print(myrenderer.render(doc))
# Print any warnings to STDERR
renderer.warnings.each do |w|
STDERR.write("#{w}\n")
end
The above above generates numeric monotonically increasing header id=""
values, which helps prevent id=""
collisions (though this is not a perfect solution), whereas as you've observed GitHub prefers to use the header's textContent
as the basis for id=""
attribute values.
...which means that GitHub is simply doing their own thing when it comes to generating id=""
attributes, and there is no published specification for whatever transformation GitHub is using.