What are those line highlighted suppose to do-CodePudding

df_scraped = pd.read_csv("labeled_tweets.csv")
df_public = pd.read_csv("public_data_labeled.csv")



> df_scraped.drop_duplicates(inplace = True)
> df_scraped.drop('id', axis = 'columns', inplace = True)
>     
> df_public.drop_duplicates(inplace = True)

LINK TO ORIGINAL CODE

Idk what the above lines is suppose to do, can somebody help me out

CodePudding user response：

df.drop_duplicates(inplace=True) will perform the removal of duplicate rows from the dataframe. The inplace = True parameter causes the change to be performed in the dataframe and does not bring a copy.

.drop('id', axis = 'columns', inplace = True) removes the 'id' column.

Pandas Documentation :
.drop_duplicates
.drop

CodePudding user response：

Those lines are data pre-processing lines (or data cleaning).

The first line removes duplicate rows from df_scraped dataframe.
The second line removes the 'id' column.
The third line duplicate rows from df_public dataframe.