Home > database >  Get markdown heading level in awk
Get markdown heading level in awk

Time:07-20

Given a string with a markdown heading, what would be the best way to return the level of the heading in awk?

Assumptions:

  • For this scenario, to be considered a heading the only requisite is that the line must start with a #
  • The level of the heading is the number of #s before another character appears
  • If the string is not a heading, the program should return nothing
  • Must use awk not gawk

Example 1

Input:

# Heading

Expected output:

1

Example 2

Input:

## Heading

Expected output:

2

Example 3

## This is level #2

Expected output:

2

Example 4

Example without a leading #s in the provided string

This a normal paragraph with a # in the middle

Expected output:


Example 5

Example with leading blank character

 # Heading

Expected output:


Example 6

Example with leading \

\# Heading

Expected output:


Example 7

##Heading

Expected output:

2

Attempts

I tried using # as separator (FS) and NF to count the number of fields, but (of course) it doesn't know if it's a # indicating heading level or an ordinary # that is part of the title text.

echo '## Heading 2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 2 (right)
echo '## This is level #2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 3 (wrong, should be 2)

I also tried with gsub, but to no avail (same problem):

echo '## Heading 2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 2 (right)
echo '## This is level #2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 3 (wrong, should be 2)

Any insights?

CodePudding user response:

What you're asking for is:

awk 'match($0,/^# /){print RLENGTH}'

e.g.:

$ cat file
# Heading
## Heading
## This is level #2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
 # Heading
\# Heading
##Heading

$ while IFS= read -r line; do
    echo "$line"
    echo "$line" | awk 'match($0,/^# /){print RLENGTH}'
done < file
# Heading
1
## Heading
2
## This is level #2
2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
 # Heading
\# Heading
##Heading
2

Do not really call awk 1 line at a time like this though as it's extremely inefficient and error prone, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice, compared to just calling awk once:

$ awk '{print} match($0,/^# /){print RLENGTH}' file
# Heading
1
## Heading
2
## This is level #2
2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
 # Heading
\# Heading
##Heading
2
  • Related