Home > database >  Bash: Check if line in file only contains specified text
Bash: Check if line in file only contains specified text

Time:08-18

I have a set of Jekyll markdown files such as:

---
layout: post
title: "Interesting Post title"
date: 2021-12-09
categories: Data Science
author:
tags: [Open source,Free software,Open-source software,Software bug,Technology,Computing,Software engineering]
---

some post summary


[Visit Link](https://somelink.net){:target="_blank" rel="noopener"}

The problem is that some of the author fields are empty. The code structure of some websites have resulted in an empty field.

To solve this, I have successfully written a separate script to retrieve the author and write it to file: author.txt -

Joe Bloggs

which needs to be read in and added to the markdown file as follows:

---
layout: post
title: "Interesting Post title"
date: 2021-12-09
categories: Data Science
author: Joe Bloggs
tags: [Open source,Free software,Open-source software,Software bug,Technology,Computing,Software engineering]
---

I have prepared the following code which successfully inserts the retrieved author into the markdown file:

#!/bin/bash

# configuration
jekyll_post_dir="<loc>"

# this link provides an empty author field
link="somelink.net"


for file in $jekyll_post_dir/*; do

    # if specified link is in any of the files
    if grep -q $link "$file"; then

        echo "Found Link"

        # extract link from file
        link=$(cat $file | grep -o -P '(?<=Link\]\().*(?=\))')

        # run script that returns author to author.txt
        python scrape_author.py $link

        author=$(cat author.txt)
        
        # replace "author:" with "author:   $author"
        sed -i -e "s/author:/author: $author/g" $file

    fi

done

Here's where I'm struggling:

I expect it is important to check if the value of author within the markdown file does or does not contain any information, so that if the script is run again, multiple values are not joined.

So my question is: How do I check if the markdown field "author:" already contains any information?

Thank you!

-- Edit --

After some thought, I put together:

check_author=$(cat $file | grep -m 1 'author:')
empty_author=9

if [ ${#check_author} -lt "$empty_author" ]; then
    # go ahead
fi

"author: " with a possible space, is less than 9 characters.

${#check_author} counts the grep result.

If the count is lower than 9 the code can go ahead.

HOWEVER tshiono's answer is much more elegant and probably less bug-prone, so that's what I'm using. Thanks also to j_b for your alternative solution.

CodePudding user response:

Would you please try:

sed -i -E "/^author:[[:blank:]]*$/ s/(author:)/\1 $author/" "$file"

It first checks if the author field is empty (the string author: is followed by nothing except for possible blank characters). If so, the substitution is performed.

Alternatively you can put the condition such as:

if grep -qE '^author:[[:blank:]]*$' "$file"; then
    ...
fi

in the outer loop to skip unnecessary processes.

As a side note, please make sure to enclose variables with double quotes.

CodePudding user response:

One option might be to search for the author: $author string in the target markdown file and only add the author text if that string is not found.

if [[ $(grep -c "author: $author" "$file") -eq 0 ]] ; then 
    sed -i -e "s/author:/author: $author/g"; 
fi

or perhaps:

grep -c "author: $author" "$file" || sed -i -e "s/author:/author: $author/g"; 

The grep -c command will count the number of times the string is found, so you can conditionally operate based on the zero count. A non-zero count indicates that the string already exists in the markdown.

  •  Tags:  
  • bash
  • Related