Home > OS >  Extracting number from in pandas dataframe column with 2 conditions
Extracting number from in pandas dataframe column with 2 conditions

Time:09-16

i have this dataframe with 1 column in python and i need to extract out the 2 numbers that come after a "#" character or the string "door". Example of this would be

String_column
#12-123,456
mom101, door 101, pop10

i only want the 2 numbers that come after the # sign or the word door. how would i go about doing this. This is what i currently have but i think this only takes in the numbers that come after the # key

import pandas as pd

df = pd.read_csv(data.csv)
df['qwerty'] = df.string_column.str.extract(
     r'(?<=#)(\d )', expand=False
).fillna(0).astype(int)

CodePudding user response:

You can use df.loc combined with apply which will get all the indexes that are true.

Here is a simple example

In [5]: df=  pd.DataFrame({'String_column':['not useful', 'door useful1', '! useful 2', 'not useful']})                                         

In [6]: df                                                                                                                                      
Out[6]: 
  String_column
0    not useful
1  door useful1
2    ! useful 2
3    not useful

Now using our function

In [7]: df.loc[df['String_column'].apply(lambda x: True if x.startswith('!') or x.startswith('door') else False)]                               
Out[7]: 
  String_column
1  door useful1
2    ! useful 2

We used startswith to match all our conditions to get the useful values that starts with '!' or 'door'.

CodePudding user response:

IIUC, you can use a non capturing group to list your different options (# or door\s*):

df['num'] = (df['String_column'].str.extract(r'(?:#|door\s*)(\d )', expand=False)
             .fillna(0).astype(int)
            )

output:

             String_column  num
0              #12-123,456   12
1  mom101, door 101, pop10  101

regex demo

  • Related