Pulling specific word from dataframe string column and storing in new column in Python-CodePudding

I have a Python dataframe column Name who's elements always contain a first name, last name, and the word "over" or "under"

For example: Name = [Michael Johnson Over, Michael Johnson Under, John Smith Over, John Smith Under]

I'm trying to create a new column Name2 that extracts either "Over" or "Under" from Name

So for the example above Name2 = [Over, Under, Over, Under]

I've tried different variations of .split & findall but can't figure out how to get a new column that just has Over or Under in it, please help!

CodePudding user response：

.str is a property on pd.Series that exposes string-parsing functionality such as .contains. You can set a new column with boolean indexing where the condition is whether or not the row in "Name" contains the keywords "Over" or "Under".

import pandas as pd
df = pd.DataFrame(
    {
        "Name": [
            "Michael Johnson Over",
            "Michael Johnson Under",
            "John Smith Over",
            "John Smith Under"
        ],
    }
)

df["Name2"] = None
df["Name2"][df["Name"].str.contains("Over")] = "Over"
df["Name2"][df["Name"].str.contains("Under")] = "Under"
print(df)

Output

    Name                    Name2
0   Michael Johnson Over    Over
1   Michael Johnson Under   Under
2   John Smith Over         Over
3   John Smith Under        Under

CodePudding user response：

You can use Pandas rsplit to split the string from the end, and use n parameter to limit number of splits in output to one. You can also use the expand=True to split strings into separate columns.

df[['First_Last','Name2']] = df['Name'].str.rsplit(' ', n=1, expand=True)

Output

                    Name       First_Last  Name2
0   Michael Johnson Over  Michael Johnson   Over
1  Michael Johnson Under  Michael Johnson  Under
2        John Smith Over       John Smith   Over
3       John Smith Under       John Smith  Under