Regular Expression for Range chinese chars and selected groups of chars-CodePudding

I'm trying to get all Chinese sentences from strings with addtional group of chars like [NAME] and [PLACE].

I have this string

<DisplayName>凡人战争</DisplayName>
<Desc>[NAME]赶到[PLACE]，发现战火正燃，此地百姓饱受战争之苦。</Desc>
<Display>劝停战争</Display>  
<OKResult><![CDATA[me:AddMsg(XT("[NAME]以仙法摄走两军首领，一番劝戒，迫使他们停止了战争 ...

and I want find

凡人战争
[NAME]赶到[PLACE]，发现战火正燃，此地百姓饱受战争之苦
[NAME]以仙法摄走两军首领，一番劝戒，迫使他们停止了战争，消弭了这场祸事
此举手段温和，虽无人知晓，但却顺应天道，[NAME]获得了一些功德

I know for chinese chars regex is [\u4e00-\u9fff\uFF0C] and for group chars (\u005BNAME\u005D) and (\u005BPLACE\u005D) but how to combine this.

I try this way written in python

Array_of_words = re.findall(r'[\u4e00-\u9fff\uFF0C(\u005BNAME\u005D)(\u005BPLACE\u005D)] ', text)

But additionally marks single letters and brackets like this:

['N', 'N', '凡人战争', 'N', '[NAME]赶到[PLACE]，发现战火正燃，此地百姓饱受战争之苦', '劝停战争', '[C', 'A', 'A[', 'A', 'M', '(', '(', '[NAME]以仙法摄走两军首领，一番劝戒，迫使他们停止了战争，消弭了这场祸事', '此举手段温和，虽无人知晓，但却顺应天道，[NAME]获得了一些功德', '))', 'A', 'P', '(', '(', '))', '()', ']]']

CodePudding user response：

You can use

re.findall(r'(?:\[(?:PLACE|NAME)]|[\u4e00-\u9fff\uFF0C]) ', text)

Details

(?: - start of a non-capturing group:
- \[(?:PLACE|NAME)] - [, then either PLACE or NAME and then ]
- | - or
- [\u4e00-\u9fff\uFF0C] - a Chinese char pattern of yours
) - end of the group, match one or more occurrences.

To match any uppercase ASCII letters inside square brackets, replace \[(?:PLACE|NAME)] with \[[A-Z] ].