Remove words and keep only digits-CodePudding

I have a list which looks like this:

Column 1	Column2
Seaworld	Ser Not123, 2345,hrt 45
Holds	DER4, 54NOT, DTR

From Colum B, I want to remove that are purely words like Ser, DTR and keep only alphanumeric like Not123 and numeric like 2345 and save the filtered result with column A and Column B to excel.

Output:

Column 1	Column2
Seaworld	Not123, 2345,45
Holds	DER4, 54NOT,

I think this can be done using this:

def filter_list(lst):
    return [word for word in lst if re.search(r'^\w $', word)]

CodePudding user response：

In your function you have the same iterating variable as the source variable. So change it something like this:

def filter_list(lst):
    return list(filter(None, [word if re.search(r'^\w $', word) else None for word in lst]))

CodePudding user response：

You can just check if the string contains a number or not. This doesn't require the re library.

import pandas as pd

def remove_purely_words(x):
    if any(i.isdigit() for i in x):
        return x
    return None

df = pd.DataFrame({'Column2': ['Ser', 'DTR', 'Not123', '2345', 'Some text']})
df = df[df['Column2'].apply(remove_purely_words).notnull()]
print(df['Column2'].tolist())


df.to_excel("output.xlsx", index=False)

Output:

['Not123', '2345']

Code snippet: https://onlinegdb.com/JHWP4ZpOJ

CodePudding user response：

IIUC, assuming you're working with a pandas dataframe, use pandas.Series.str.findall :

df["Column B"] = df["Column B"].str.findall("(\w*\d \w*)").agg(", ".join)

Then, if needed, you can use pandas.DataFrame.to_excel to make an Excel spreadsheet :

df.to_excel("output.xlsx", index=False)

Output :

print(df)

   Column A             Column B
0  Seaworld  Not123, 2345, hrt45
1     Holds             DER4, 54