Given a string with a markdown heading, what would be the best way to return the level of the heading in awk
?
Assumptions:
- For this scenario, to be considered a heading the only requisite is that the line must start with a
#
- The level of the heading is the number of
#
s before another character appears - If the string is not a heading, the program should return nothing
- Must use
awk
notgawk
Example 1
Input:
# Heading
Expected output:
1
Example 2
Input:
## Heading
Expected output:
2
Example 3
## This is level #2
Expected output:
2
Example 4
Example without a leading #
s in the provided string
This a normal paragraph with a # in the middle
Expected output:
Example 5
Example with leading blank character
# Heading
Expected output:
Example 6
Example with leading \
\# Heading
Expected output:
Example 7
##Heading
Expected output:
2
Attempts
I tried using #
as separator (FS
) and NF
to count the number of fields, but (of course) it doesn't know if it's a #
indicating heading level or an ordinary #
that is part of the title text.
echo '## Heading 2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 2 (right)
echo '## This is level #2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 3 (wrong, should be 2)
I also tried with gsub
, but to no avail (same problem):
echo '## Heading 2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 2 (right)
echo '## This is level #2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 3 (wrong, should be 2)
Any insights?
CodePudding user response:
What you're asking for is:
awk 'match($0,/^# /){print RLENGTH}'
e.g.:
$ cat file
# Heading
## Heading
## This is level #2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
# Heading
\# Heading
##Heading
$ while IFS= read -r line; do
echo "$line"
echo "$line" | awk 'match($0,/^# /){print RLENGTH}'
done < file
# Heading
1
## Heading
2
## This is level #2
2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
# Heading
\# Heading
##Heading
2
Do not really call awk 1 line at a time like this though as it's extremely inefficient and error prone, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice, compared to just calling awk once:
$ awk '{print} match($0,/^# /){print RLENGTH}' file
# Heading
1
## Heading
2
## This is level #2
2
Example without a leading #s in the provided string
This a normal paragraph with a # in the middle
# Heading
\# Heading
##Heading
2