I'm trying to extract words which contain letter only into a new column, and any word which contains a number into a different column.
Desired Output:
query words_only contains_number
0 Nike Air Max 97 Nike Air Max 97
1 Adidas NMD-R1 Adidas NMD-R1
2 Nike Air Max 270 Nike Air Max 270
What I've Tried:
I've seen the answer here which gets me some of the way there, but it's not exactly what I need.
How to extract words containing only letters from a text in python?
Minimum Reproducible Example:
# Import pandas library
import pandas as pd
# initialize list elements
data = ["Nike Air Max 97", "Adidas NMD R1", "Nike Air Max 270"]
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['query'])
# print dataframe.
print(df)
CodePudding user response:
You can use a regex with str.extractall
to extract the words with and without digits separately, then groupby.agg
to join them separately:
df[['words_only', 'contains_number']] = (df['query']
.str.extractall(r'(\S*\d\S*)|([^\s\d] )') # order is important
.groupby(level=0).agg(lambda s: ' '.join(s.dropna()))
.loc[:, ::-1] # invert 2 columns
)
Output:
query words_only contains_number
0 Nike Air Max 97 Nike Air Max 97
1 Adidas NMD-R1 Adidas NMD-R1
2 Nike Air Max 270 Nike Air Max 270