I have a list which looks like this:
Column 1 | Column2 |
---|---|
Seaworld | Ser Not123, 2345,hrt 45 |
Holds | DER4, 54NOT, DTR |
From Colum B, I want to remove that are purely words like Ser, DTR and keep only alphanumeric like Not123 and numeric like 2345 and save the filtered result with column A and Column B to excel.
Output:
Column 1 | Column2 |
---|---|
Seaworld | Not123, 2345,45 |
Holds | DER4, 54NOT, |
I think this can be done using this:
def filter_list(lst):
return [word for word in lst if re.search(r'^\w $', word)]
CodePudding user response:
In your function you have the same iterating variable as the source variable. So change it something like this:
def filter_list(lst):
return list(filter(None, [word if re.search(r'^\w $', word) else None for word in lst]))
CodePudding user response:
You can just check if the string contains a number or not. This doesn't require the re
library.
import pandas as pd
def remove_purely_words(x):
if any(i.isdigit() for i in x):
return x
return None
df = pd.DataFrame({'Column2': ['Ser', 'DTR', 'Not123', '2345', 'Some text']})
df = df[df['Column2'].apply(remove_purely_words).notnull()]
print(df['Column2'].tolist())
df.to_excel("output.xlsx", index=False)
Output:
['Not123', '2345']
Code snippet: https://onlinegdb.com/JHWP4ZpOJ
CodePudding user response:
IIUC, assuming you're working with a pandas dataframe, use pandas.Series.str.findall
:
df["Column B"] = df["Column B"].str.findall("(\w*\d \w*)").agg(", ".join)
Then, if needed, you can use pandas.DataFrame.to_excel
to make an Excel spreadsheet :
df.to_excel("output.xlsx", index=False)
Output :
print(df)
Column A Column B
0 Seaworld Not123, 2345, hrt45
1 Holds DER4, 54