I have a pandas DataFrame and would like to extract all the rows that have a regex match based on a single column's string value.
My example:
data = {'Year': [2016, 2016, 2017, 2018, 2018, 2019, 2020],
'Code': ['a5', 'b3', 'c3', 'd7', 'e8', 'f5', 'g1'],
'Port Nationality': ['UK-England', 'UK-Scotland', 'Germany', 'Ireland', 'France', 'UK-Wales', 'UK-England']}
df = pd.DataFrame(data)
df
Output:
Year Code Port Nationality
0 2016 a5 UK-England
1 2016 b3 UK-Scotland
2 2017 c3 Germany
3 2018 d7 Ireland
4 2018 e8 France
5 2019 f5 UK-Wales
6 2020 g1 UK-England
In this example, I would like to extract all the rows where the column Port Nationality
has the regex value 'UK-'
. That is, I would like a new dataframe that looks like this:
New dataframe:
Year Code Port Nationality
0 2016 a5 UK-England
1 2016 b3 UK-Scotland
5 2019 f5 UK-Wales
6 2020 g1 UK-England
Thanks in advance.
CodePudding user response:
Did you try this with startswith
?
df = df.loc[df["Port Nationality"].str.startswith('UK-', na=False)]
using, contains
,
df = df.loc[df["Port Nationality"].str.contains('England', regex=True, na=False)]
CodePudding user response:
You can use str.contains
to do the same
df[df['Port Nationality'].str.contains('UK', regex=True)]
gives you the expected output
Year Code Port Nationality
0 2016 a5 UK-England
1 2016 b3 UK-Scotland
5 2019 f5 UK-Wales
6 2020 g1 UK-England