I don't understand how to write a regex for these multiple patterns to be removed (with something like .replace(pattern, "")
):
two or more spaces not in a string removed
two or more spaces in a string reduced to one (ex: " text other " -> "text other")
one or more spaces removed after and before characters such as:
\n
\r\n
\t
replace
\r\n
with\n
I tried with |\\n |\t \\r\n .
but obviously this doesn't work totally.
We can use the below patterns to check it's working:
assert_eq!(not_useful_space(" "), "");
assert_eq!(not_useful_space(" a l l lower "), "a l l lower");
assert_eq!(not_useful_space(" i need\n new lines\n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \n new lines \n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \r\n new lines\r\nmany times "), "i need\nnew lines\nmany times");
assert_eq!(not_useful_space(" i need \t new lines\t \t many times "), "i need new lines many times");
assert_eq!(not_useful_space(" à la "), "à la");
CodePudding user response:
You can do this in a single regex with MULTILINE
flag enabled:
(?m)[ \t]*\r[ \t]*|^[ \t] |[ \t] $|\t] $|([ \t]){2,}
Replace it with $1
string.
Rust Code:
use once_cell::sync::Lazy;
use regex::Regex;
pub fn magic(input: &str) -> String {
static REGEX: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"(?m)[ \t]*\r[ \t]*|^[ \t] |[ \t] $|\t] $|([ \t]){2,}").unwrap()
});
REGEX.replace_all(input, "$1").to_string()
}
#[cfg(test)]
fn magic_data() -> std::collections::HashMap<&'static str, &'static str> {
std::collections::HashMap::from([
(" ", ""),
(" a l l lower ", "a l l lower"),
(
" i need\n new lines\n\n many times ",
"i need\nnew lines\n\nmany times",
),
(
" i need \n new lines \n\n many times ",
"i need\nnew lines\n\nmany times",
),
(
" i need \r\n new lines\r\nmany times ",
"i need\nnew lines\nmany times",
),
(
" i need \t new lines\t \t many times ",
"i need new lines many times",
),
(" à la ", "à la"),
])
}
#[test]
fn test() {
for (k, v) in magic_data() {
assert_eq!(magic(k), v)
}
}
Javascript Demo:
function assert_eq(lhs, rhs) {
console.log(lhs == rhs);
}
function not_useful_space(str) {
return str.replace(/^[ \t] |[ \t] $|\r|([ \t]){2,}/mg, '$1');
}
assert_eq(not_useful_space(" "), "");
assert_eq(not_useful_space(" a l l lower "), "a l l lower");
assert_eq(not_useful_space(" i need\n new lines\n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq(not_useful_space(" i need \n new lines \n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq(not_useful_space(" i need \r\n new lines\r\nmany times "), "i need\nnew lines\nmany times");
assert_eq(not_useful_space(" i need \t new lines\t \t many times "), "i need new lines many times");
assert_eq(not_useful_space(" à la "), "à la");
RegEx Breakup:
^
: start[ \t]*\r[ \t]*
: Match\r
surrounded with optional spaces on both sides[ \t]
: match 1 of space or tab characters|
: OR[ \t]
: match 1 of space or tab characters$
: end|
: OR([ \t]){2,}
: match 2 of space or tab characters$1
: Is replacement to get single space/tab character back in substitution
CodePudding user response:
If you're interested, here's a non-regex version:
fn not_useful_space(text: &str) -> String {
text.lines()
.map(|line| {
line.trim()
.split_ascii_whitespace()
.collect::<Vec<_>>()
.join(" ")
})
.collect::<Vec<_>>()
.join("\n")
}