twitter_url = 'https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/twitter_nov_2019.csv'
twitter_df = pd.read_csv(twitter_url)
Write a function which splits the sentences in a dataframe's column into a list of the separate words. The created lists should be placed in a column named 'Split Tweets' in the original dataframe. This is also known as tokenization.
Function Specifications:
It should take a pandas dataframe as an input. The dataframe should contain a column, named 'Tweets'. The function should split the sentences in the 'Tweets' into a list of seperate words, and place the result into a new column named 'Split Tweets'. The resulting words must all be lowercase! The function should modify the input dataframe directly. The function should return the modified dataframe.
my codes not giving desired result and I'm not sure the what the reason is???
def word_splitter(df):
# your code here
df['Split_Tweets'] = df['Tweets'].str.split(' ',expand=True)
return df
CodePudding user response:
As JNevill pointed out you don't need the expand=True
. Also you need to lowercase your column:
def word_splitter(df):
# your code here
df['Split_Tweets'] = df['Tweets'].apply(str.lower).str.split(' ')
return df