Home > Software engineering >  Pandas dataframe : sample() function resets indexes?
Pandas dataframe : sample() function resets indexes?

Time:05-24

Please consider a panda dataframe final_df with 142457 rows correctly indexed:

0
1 
2
3
4
...
142452
142453
142454
142455
142456

I create / sample a new df data_test_for_all_models from this one:

data_test_for_all_models = final_df.copy().sample(frac=0.1, random_state=786)

A few indexes:

2235
118727
23291`

Now I drop rows from final_df with indexes in data_test_for_all_models :

final_df = = final_df.drop(data_test_for_all_models.index)

If I check a few indexes present in final_df :

final_df.iloc[2235] 

returns wrongly a row.

I think it's a problem of reset indexes but which function does it: drop(), sample()?

Thanks.

CodePudding user response:

You are using .iloc which provides integer-based indexing. You are getting the row number 2235, not the row with index 2235.

For that, you should use .loc:

final_df.loc[2235]

And you should get a KeyError.

  • Related