df_scraped = pd.read_csv("labeled_tweets.csv")
df_public = pd.read_csv("public_data_labeled.csv")
> df_scraped.drop_duplicates(inplace = True)
> df_scraped.drop('id', axis = 'columns', inplace = True)
>
> df_public.drop_duplicates(inplace = True)
Idk what the above lines is suppose to do, can somebody help me out
CodePudding user response:
df.drop_duplicates(inplace=True)
will perform the removal of duplicate rows from the dataframe. The inplace = True
parameter causes the change to be performed in the dataframe and does not bring a copy.
.drop('id', axis = 'columns', inplace = True)
removes the 'id' column.
Pandas Documentation :
.drop_duplicates
.drop
CodePudding user response:
Those lines are data pre-processing lines (or data cleaning).
- The first line removes duplicate rows from df_scraped dataframe.
- The second line removes the 'id' column.
- The third line duplicate rows from df_public dataframe.