Home > database >  Return certain character or word followed or proceeded by space- Regex Python
Return certain character or word followed or proceeded by space- Regex Python

Time:11-16

Try to select only the size of clothes using regex expression

So I am new to Python and I trying to select rows find these sizes but gets confused with other words. I using regex expression but failed to obtain the desired result.

Code:

df = pd.DataFrame({"description":["Silver","Red","GOLD","Black Leather","S","L","S","XL","XXL","Noir Matt"," 150x160L","140M"]})
df.description.apply(lambda x : x if re.findall(r"(?!\s \d )(S|M|X*L)(?!\s \d )",str(x)) else np.nan).unique()

Output:

array(['Silver', nan, 'Black Leather', 'S', 'L', 'XL', 'XXL', 'Noir Matt',
       ' 150x160L', '140M'], dtype=object)

Expected:

array([ 'S', 'L', 'XL', 'XXL',nan], dtype=object)

CodePudding user response:

I think you need to use

import pandas as pd
df = pd.DataFrame({"description":["Silver","Red","GOLD","Black Leather","S","L","S","XL","XXL","Noir Matt"," 150x160L","140M"]})
df['description'][df['description'].str.match(r'^(?:S|M|X*L)$')].unique()
# => array(['S', 'L', 'XL', 'XXL'], dtype=object)

Using the Series.str.match(r'^(?:S|M|X*L)$'), you subset the part of a description column that fully match S, M, zero or more Xs and then L values.

  • Related