I'm using python pandas with Postgresql,and I have a table named stock:
open high low close volume datetime
383.97 384.22 383.66 384.08 1298649 2022-12-16 14:25:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.64 384.2099 383.54 383.61 1439271 2022-12-16 14:15:00
How can I remove the rows that have duplicated datetime ,and only keep 1 row of it,only keep the latest row of it.
The output should be:
open high low close volume datetime
383.97 384.22 383.66 384.08 1298649 2022-12-16 14:25:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.64 384.2099 383.54 383.61 1439271 2022-12-16 14:15:00
Something like:
delete from stock where datetime duplicated > 1
CodePudding user response:
Use drop_duplicates()
from pandas
:
# Remove duplicates and keep only the last row
df = df.drop_duplicates(subset='datetime', keep='last')