Home > Software design >  extract with regex if/else statement
extract with regex if/else statement

Time:12-18

Is it possible to create a (def) function & extract specific values from 1 column(has 50 rows) and use regex with an if/else statement?

I need to extract the following strings as if/else statement

\d*\s*milliliter
\d*\s*liter
\d*\s*ounce
\d*\s*kilogram
\d*\s*fluid\s*ounce

It can be returned as 'None' if the match is not found

My code below is currently looking very simple with just the extract. But I am not be able to figure out how to code it in an else/if/return statement.

def extract_data(df):

    pattern = '(\d*\s*milliliter|\d*\s*liter|\d*\s*ounce|\d*\s*kilogram|\d*\s*fluid\s*ounce)'

    return df.str.extract(pattern)

The column name is "Unit"

CodePudding user response:

Example

data = {'Unit': {0: '\\d*\\s*milliliter', 1: '\\d*\\s*liter', 2: '\\d*\\s*ounce', 
                 3: '\\d*\\s*kilogram', 4: '\\d*\\s*fluid\\s*ounce', 5: '\\d*\\s*nanosecond'}}
df = pd.DataFrame(data)

df

Unit
0   \d*\s*milliliter
1   \d*\s*liter
2   \d*\s*ounce
3   \d*\s*kilogram
4   \d*\s*fluid\s*ounce
5   \d*\s*nanosecond

Code

str.extract don need if/else for extract NaN when match is not.

pat = r'(\d*\s*milliliter|\d*\s*liter|\d*\s*ounce|\d*\s*kilogram|\d*\s*fluid\s*ounce)'
out = df['Unit'].str.extract(pat)

out

    0
0   milliliter
1   liter
2   ounce
3   kilogram
4   ounce
5   NaN
  • Related