Home > Back-end >  How to compose two calls to Regex::replace_all?
How to compose two calls to Regex::replace_all?

Time:05-10

Regex::replace_all has the signature fn (text: &str) -> Cow<str>. How would two calls to this be written, f(g(x)), giving the same signature?

Here's some code I'm trying to write. This has the two calls separated out into two functions, but I couldn't get it working in one function either. Here's my lib.rs in a fresh Cargo project:

#![allow(dead_code)]

/// Plaintext and HTML manipulation.

use lazy_static::lazy_static;
use regex::Regex;
use std::borrow::Cow;

lazy_static! {
    static ref DOUBLE_QUOTED_TEXT: Regex = Regex::new(r#""(?P<content>[^"] )""#).unwrap();
    static ref SINGLE_QUOTE:       Regex = Regex::new(r"'").unwrap();
}


fn add_typography(text: &str) -> Cow<str> {
    add_double_quotes(&add_single_quotes(text)) // Error! "returns a value referencing data owned by the current function"
}

fn add_double_quotes(text: &str) -> Cow<str> {
    DOUBLE_QUOTED_TEXT.replace_all(text, "“$content”")
}

fn add_single_quotes(text: &str) -> Cow<str> {
    SINGLE_QUOTE.replace_all(text, "’")
}


#[cfg(test)]
mod tests {
    use crate::{add_typography};

    #[test]
    fn converts_to_double_quotes() {
        assert_eq!(add_typography(r#""Hello""#), "“Hello”");
    }

    #[test]
    fn converts_a_single_quote() {
        assert_eq!(add_typography("Today's Menu"), "Today’s Menu");
    }
}

Maybe manually coding to match on the intermediate Cow enum? I've tried lots of combinations of .into, .as_ref, etc.

CodePudding user response:

A Cow contains maybe-owned data.

We can infer from what the replace_all function does that it returns borrowed data only if substitutions happened, otherwise it has to return new, owned data.

The problem arises when the inner call makes a substitution but the outer one does not. In that case, the outer call will simply pass its input through, but that input was to data owned by the function. The function would therefore return a Cow::Borrowed, but would borrow from the temporary, and that's obviously not memory-safe.

Basically, this function will only ever return borrowed data when no substitutions were made by either call. What we need is a helper that can propagate owned-ness through the call layers whenever the returned Cow is itself owned.

We can construct a .map() extension method on top of Cow that does exactly this:

use std::borrow::{Borrow, Cow};

trait CowMapExt<'a, B>
    where B: 'a   ToOwned   ?Sized
{
    fn map<F>(self, f: F) -> Self
        where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>;
}

impl<'a, B> CowMapExt<'a, B> for Cow<'a, B>
    where B: 'a   ToOwned   ?Sized
{
    fn map<F>(self, f: F) -> Self
        where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>
    {
        match self {
            Cow::Borrowed(v) => f(v),
            Cow::Owned(v) => Cow::Owned(f(v.borrow()).into_owned()),
        }
    }
}

Now your call site can stay nice and clean:

fn add_typography(text: &str) -> Cow<str> {
    add_single_quotes(text).map(add_double_quotes)
}

CodePudding user response:

I came up with this solution to my question, but it's very verbose:

fn add_typography(input: &str) -> Cow<str> {
    match add_single_quotes(input) {
        Cow::Owned(output) => add_double_quotes(&output).into_owned().into(),
        _                  => add_double_quotes(input),
    }
}

This definitely works, but there seems to be a lot of boilerplate here that could maybe be avoided.

My codebase will need three, four, or more composed calls to replace_all(). So it seems like the pattern above doesn't scale nicely for that.

  • Related