Im working on a large dataset of 7GB were i need to use BERT AI algorithm for text classification, i used a random dataset i found on kaggle as an alternative example to minimise the process time and to apply a function i created (for future use on the original dataset) to clean the text by removing punctuations and lemmatize words etc,.So when i chose the column "Message to examine to clean all the texts there by using the .apply from pandas library, it works fine but when i add the result i get to a new dataframe or to the same dataframe, all rows turns into empty rows with no value. anyone knows how can i fix this issue?
i tried the lambda function inside apply
newtext['message to examine'] = newtext['message to examine'].apply(lambda x : clean_text(x))
i tried copying the dataframe and store it to a new one
newdataframe = pd.DataFrame(df['message to examine'].apply(cleantext)).copy()
CodePudding user response:
Usually this happens when one forgets to add a return statement to their apply function, which in this case is your clean_text.
As a side-note, you can simply do .apply(clean_text) without the lambda function.