Home > Net >  PHP Regular Expression to remove empty tags
PHP Regular Expression to remove empty tags

Time:12-10

I'm attempting to build a regular expression to remove empty tags that may or may not have white space between them.

So far I'm using this:

 $pattern = '/<p>\s*<\/p>/im';
 $cleaned_html = preg_replace($pattern, "", $unclean_html);

This is the contents of $unclean_html can be seen here:

<!DOCTYPE html>
<!-- Generated by PHPWord -->
<html>
<head>
<meta charset="UTF-8" />
<title>PHPWord</title>
<meta name="author" content="Dustin Chandler" />
<style>
* {font-family: Arial; font-size: 12pt;}
a.NoteRef {text-decoration: none;}
hr {height: 1px; padding: 0; margin: 1em 0; border: 0; border-top: 1px solid #CCC;}
table {border: 1px solid black; border-spacing: 0px; width : 100%;}
td {border: 1px solid black;}
</style>
</head>
<body>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Cutline: North Carolina is the second-most at-risk state in the nation for farmland loss, according to a study by American Farmland Trust.</span></p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Head shot c</span><span style="font-weight: bold;">utline: N.C. Agriculture Commissioner Steve Troxler is working on ways to preserve N.C. farmland.</span></p>
<p> </p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;"><span style="font-weight: bold;">Digging into solutions: Troxler highlights strategies</span><span style="font-weight: bold;"> </span><span style="font-weight: bold;">for</span><span style="font-weight: bold;"> </span><span style="font-weight: bold;">N.C. farmland</span></p>
<p> </p>
<p style="text-align: left; margin-top: 0pt; margin-bottom: 0pt;">Agricultual education</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Agriculture may be North Carolina’s top industry, but the state is losing farmland – fast.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">In fact, North Carolina had the second highest rate of farmland loss in the country in 2020, according to a report from the American Farmland Trust.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Evan Davis, director of the Agricultural Development and Farmland Preservation Trust Fund, joined N.C. Agriculture Commissioner Steve Troxler and other officials from the Department of Agriculture & Consumer Services in the third professional development seminar of the fall, speaking to students and faculty about the importance of farmland preservation and conservation measures to curb future loss.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">According to the report, Davis said, “732,000 acres of ag land were converted to non-ag uses between 2001 and 2016. More than 571,000 acres were converted to scattered, large-lot housing developments. North Carolina led the nation in this kind of development.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Unfortunately, much of this land was categorized by the agency as “nationally significant land,” the best land for long-term production of food and fiber.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“So, not only are we losing farmland, but we’re losing our most productive land,” Davis said,</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Upon entering office in 2005, Troxler said he and his attorney spent several weeks looking for answers on how to preserve N.C. farmland.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“I grew up in the little town of Brown Summit,” said Troxler. “When I was farming, I saw encroachment start to happen and I saw farms and forests start disappearing. When I went into (this) office, I looked around the state and saw the same thing all over North Carolina.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“We found out that the states that were doing the most to preserve farmland were the states that had already lost the majority of their farmland,” said Troxler. “We certainly don’t want to get to that point in North Carolina.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">A recent report by American Farmland Trust predicted the rates of national farmland loss by the year 2040 under current development trends, runaway sprawl and “better-built cities” (compact and dense development). According to the report, North Carolina would lose farmland at the second-highest rate in the nation in each scenario.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“If you look at the current development trends, we’re projected to lose almost 1.2 million acres by 2040,” said Davis. “In the worst scenario we would lose more acres, 1.6 million, than the entire state of Delaware.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">To mitigate this, Davis said it’s important to look at legal tools that would aid in preservation such as agricultural conservation easement.  Easements restrict residential, commercial, and industrial development to ensure the land remains in agricultural, horticultural, or forestry production. The most common, a perpetual conservation easement, is often used in partnership with USDA, the military and ADFP Trust Fund.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Land -development issues making sure that the open space and natural resources of private farmland cleansing water runoff, wildlife habitats, etc. – were preserved during zoning while not decreasing the property value as well as ensuring the landowner’s private property rights were not infringed.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“We set up a system where we pay people for the development rights on the piece of property that they own,” Troxler explained. “They can participate in escalating real estate prices, but at the same time, make sure that the land is always farmland.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Assistant Commissioner Alexander “Sandy” Stewart, Ph.D., used the history of his farm in Moore County, a 186-acre non-contiguous property owned by his family since 1775, to illustrate his personal stake in preservation.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“Farmland preservation, for my farm, is important, but it’s also important to what my neighbors do with their place, up the hill and around in the community,” said Stewart.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Studies show that both agricultural land and industrial land are net contributors to a county, Stewart explained. </p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“The cost of a county providing trash, water, sewer, fire prevention, police or other services to the land costs the county less than what the land generates in tax base. However, when you get to residential land, the cost of community services is usually the opposite. It normally costs the county more per acre to provide those services because they’re such heavy users of the services. Tax rates are higher, but they use the services at such a higher rate.”</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">Stewart explained that while those in agriculture and general landowners aren’t categorically opposed to development plans, a proper arrangement between landowners and county/municipal government.</p>
<p> </p>
<p style="margin-top: 0; margin-bottom: 0;">“I think that everybody wants good neighbors,” said Stewart. “They just want to see a smart plan.”</p>
<p> </p>
<p> </p>
</body>
</html>

It's not matching anything in the unclean html, though I've tried some online regular expressions testers and it shows to match the empty paragraph tags.

I've also tried

$clean_html = str_replace("<p> </p>", "", $unclean_htnl);

That still doesn't replace anything either.

CodePudding user response:

first of all, you have to use the backslash \ as the escape character.

Next, pay attention when using backslash escaping in a single-quote and double-quote strings. Better use single-quoted strings for regular expression patterns.

You have to use preg_replace (not str_replace) for regular expressions.

Also, if you want to use forward slash / as a part of your pattern (e.g. in </p>), consider using other pattern delimiters. E.g. #:

<?php

  $unclean_html = '<p>tag</p>;  empty<p> </p>tag;  linefeed<p>
  </p>tag; <p>other tag</p>';

  $clean_html = preg_replace("#<p>[\s\x{00A0}]*</p>#iu", "", $unclean_html);

  print($clean_html); // <p>tag</p>;  emptytag;  linefeedtag; <p>other tag</p>

UPD in the example you've provided, there is a non-breakable space character between <p> and </p>.

preg pattern \s does not include this symbol, therefore you need to add it manually. Use codepoint 0x00A0 and u option if you have your text encoded in UTF-8.

  • Related