Home > other >  Extract rows from pandas dataframe on single column value using Regex
Extract rows from pandas dataframe on single column value using Regex

Time:08-21

I have a pandas DataFrame and would like to extract all the rows that have a regex match based on a single column's string value.

My example:

data = {'Year': [2016, 2016, 2017, 2018, 2018, 2019, 2020], 
        'Code': ['a5', 'b3', 'c3', 'd7', 'e8', 'f5', 'g1'],
        'Port Nationality': ['UK-England', 'UK-Scotland', 'Germany', 'Ireland', 'France', 'UK-Wales', 'UK-England']}

df = pd.DataFrame(data)

df

Output:

    Year    Code    Port Nationality
0   2016    a5      UK-England
1   2016    b3      UK-Scotland
2   2017    c3      Germany
3   2018    d7      Ireland
4   2018    e8      France
5   2019    f5      UK-Wales
6   2020    g1      UK-England

In this example, I would like to extract all the rows where the column Port Nationality has the regex value 'UK-'. That is, I would like a new dataframe that looks like this:

New dataframe:

    Year    Code    Port Nationality
0   2016    a5      UK-England
1   2016    b3      UK-Scotland
5   2019    f5      UK-Wales
6   2020    g1      UK-England

Thanks in advance.

CodePudding user response:

Did you try this with startswith?

df  = df.loc[df["Port Nationality"].str.startswith('UK-', na=False)]

using, contains,

df  = df.loc[df["Port Nationality"].str.contains('England', regex=True, na=False)]

CodePudding user response:

You can use str.contains to do the same

df[df['Port Nationality'].str.contains('UK', regex=True)]

gives you the expected output

   Year Code Port Nationality
0  2016   a5       UK-England
1  2016   b3      UK-Scotland
5  2019   f5         UK-Wales
6  2020   g1       UK-England
  • Related