I want to remove stopwords from a column contains text in my excel file. I use this python code but it does not work.
import string
stopwords = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them","their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing","a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to","from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other","some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]
def clean(text):
words = text.split(' ')
return [''.join([c for c in word if c not in string.punctuation]) for word in words if word not in stopwords and word not in string.punctuation]
import pandas as pd
df = pd.read_excel(df.csv)
df['text'] = df['text'].apply(clean)
I get this error: strong text'DataFrame' object has no attribute 'csv'
CodePudding user response:
Here is a method I found using pandas:
import pandas as pd
stopwords = ["LIST","OF","STOP","WORDS"]
df = pd.read_csv('FILE_NAME.csv')
for i in range(len(df['COLUMN_NAME'].index)):
for j in range(len(stopwords)):
if stopwords[j] in df['COLUMN_NAME'].iloc[i]:
df['COLUMN_NAME'].iloc[i] = (df['Text'].iloc[i]).replace(stopwords[j],"")
If I were you, I would upload this dataframe to a new csv file using the following code:
df.to_csv('FILE_NAME.csv')
CodePudding user response:
df = pd.read_excel(df.csv)
First clean up the text in parenthesis(df.csv) and replace it with a string which is Filename Extension
df = pd.read_excel('df.xlsx')
read_excel should be used on an xlsx file