I was trying to run the below code from Pycharm
and from Jupyter Notebook
. In Jupyter the error didn't occur while Pycharm did. Can Someone help to figure out the issue?
Below is the dataset visualization of news_collection.csv
created_at,text
5/13/2021 3:27:55 PM,"Srilanka team is well prepared for the worldCup 2021"
5/13/2021 3:27:55 PM,"They will be missing Lasith Malinga for sure"
Below is the code that gives the above error
import pandas as pd
def aggregated():
tweets = pd.read_csv(r'news_collection.csv')
df = pd.DataFrame(tweets, columns=['created_at', 'text'])
df['created_at'] = pd.to_datetime(df['created_at'])
df['text'] = df['text'].apply(lambda x: str(x))
pd.set_option('display.max_colwidth', 0)
df = df.groupby(pd.Grouper(key='created_at', freq='1D')).agg(lambda x: '
'.join(set(x)))
return df
if __name__ == '__main__':
print(aggregated())
aggregated().to_csv(r'preprocessed_tweets_aggregated.csv',index = True,
header=True)
CodePudding user response:
Just closing this for other people with the same problem. It was a Pandas version issue. See comments.
CodePudding user response:
The issue which had thrown the error is due to a version issue in the pandas package in Pycharm. I was running the same code on Jupyter with pandas 1.1.5 version while in Pycharm running with pandas 1.3.0 which wasn't working.
So to change a package version in Pycharm you can follow the below steps(In my case I had to downgrade the pandas version to 1.1.5)
Step 01 - Goto your project in Pycharm and Select the options as below
Step 02 - Then You will direct to "Python Interpreter" tab -> Select the Package You want to Change(Pandas in my case) -> Double click on the Version -> Select the Specify Version check box -> Give the version you want to upgrade or downgrade -> Select Install Package