Is it possible to create a (def) function & extract specific values from 1 column(has 50 rows) and use regex with an if/else statement?
I need to extract the following strings as if/else statement
\d*\s*milliliter
\d*\s*liter
\d*\s*ounce
\d*\s*kilogram
\d*\s*fluid\s*ounce
It can be returned as 'None' if the match is not found
My code below is currently looking very simple with just the extract. But I am not be able to figure out how to code it in an else/if/return statement.
def extract_data(df):
pattern = '(\d*\s*milliliter|\d*\s*liter|\d*\s*ounce|\d*\s*kilogram|\d*\s*fluid\s*ounce)'
return df.str.extract(pattern)
The column name is "Unit"
CodePudding user response:
Example
data = {'Unit': {0: '\\d*\\s*milliliter', 1: '\\d*\\s*liter', 2: '\\d*\\s*ounce',
3: '\\d*\\s*kilogram', 4: '\\d*\\s*fluid\\s*ounce', 5: '\\d*\\s*nanosecond'}}
df = pd.DataFrame(data)
df
Unit
0 \d*\s*milliliter
1 \d*\s*liter
2 \d*\s*ounce
3 \d*\s*kilogram
4 \d*\s*fluid\s*ounce
5 \d*\s*nanosecond
Code
str.extract
don need if/else for extract NaN when match is not.
pat = r'(\d*\s*milliliter|\d*\s*liter|\d*\s*ounce|\d*\s*kilogram|\d*\s*fluid\s*ounce)'
out = df['Unit'].str.extract(pat)
out
0
0 milliliter
1 liter
2 ounce
3 kilogram
4 ounce
5 NaN