Home > Software design >  Regex expression minimum length of the word
Regex expression minimum length of the word

Time:08-24

I am looking for an expression that removes numbers when the word is longer than 8 characters.

For example:

"Python300" -> "Python"

"Python37" -> "Python37"

I use this expression ^(?=.*[a-zA-Z0-9]{8,})(?=.*[0-9]).*$ but select all.

Thank you!!

CodePudding user response:

Can't it be a simple if?

import re
max_length = 9
s = 'Python300'
s = s if len(s) < max_length else re.sub(r'[0-9] ', '', s)

CodePudding user response:

You can match using this regex to remove all trailing digits from words with length greater than 8:

\b(?=\w{9,})(\w ?)\d \b

and replace using:

r'\1'

RegEx Demo

RegEx Explanation:

  • \b: Word boundary
  • (?=\w{9,}): Make sure word has 9 or more characters
  • (\w ?): Match 1 word chars in capture group #1 (lazy match)
  • \d : Match 1 trailing digits
  • \b: Word boundary

Code:

import re

arr = ['Python300', 'Python37']

for s in arr:
    print (re.sub(r'\b(?=\w{9,})(\w ?)\d \b', r'\1', s))

Output:

Python
Python37

CodePudding user response:

I tried to use the regex expression but it didn't work.

I have put a code in pyspark so that it can be replicated.

Thanks anyway

a = ['python37', 'python300', '19Covid', '1234Spark', 'spark-2-python']
b = ['python37', 'python', '19Covid', 'Spark', 'spark--python']


impacto = pd.DataFrame (zip(a,b), columns = ['input', "expected"])
spark.createDataFrame(impacto) \
    .withColumn("result", sf.regexp_replace(sf.col("input"), r"\b(?=\w{9,})(\w ?)\d \b", r'\1')) \
    .show()
 -------------- ------------- -------------- 
|         input|     expected|        result|
 -------------- ------------- -------------- 
|      python37|     python37|      python37|
|     python300|       python|             1|
|       19Covid|      19Covid|       19Covid|
|     1234Spark|        Spark|     1234Spark|
|spark-2-python|spark--python|spark-2-python|
 -------------- ------------- -------------- 
  • Related