Home > Software design >  What are those line highlighted suppose to do
What are those line highlighted suppose to do

Time:03-10

df_scraped = pd.read_csv("labeled_tweets.csv")
df_public = pd.read_csv("public_data_labeled.csv")



> df_scraped.drop_duplicates(inplace = True)
> df_scraped.drop('id', axis = 'columns', inplace = True)
>     
> df_public.drop_duplicates(inplace = True)

LINK TO ORIGINAL CODE

Idk what the above lines is suppose to do, can somebody help me out

CodePudding user response:

df.drop_duplicates(inplace=True) will perform the removal of duplicate rows from the dataframe. The inplace = True parameter causes the change to be performed in the dataframe and does not bring a copy.

.drop('id', axis = 'columns', inplace = True) removes the 'id' column.

Pandas Documentation :
.drop_duplicates
.drop

CodePudding user response:

Those lines are data pre-processing lines (or data cleaning).

  1. The first line removes duplicate rows from df_scraped dataframe.
  2. The second line removes the 'id' column.
  3. The third line duplicate rows from df_public dataframe.
  • Related