Home > Software engineering >  Handling duplicates in a Pandas dataframe
Handling duplicates in a Pandas dataframe

Time:12-31

I'm pulling cryptocurrency data from binance API and storing it in an database, its working, but how do I work around duplicates in the database? If I run the for loop a second time, it raises

ValueError: Table "'BTCUSDT' already exists."

How do stop it from creating a new table and just inserting a value if it's not already there using a timestamp as the key?

engine = create_engine('sqlite:///Cryptoprices3.db')

def getminutedata(symbol, lookback):
    frame = pd.DataFrame(client.get_historical_klines(symbol,'1m',lookback   ' days ago UTC' ))

    frame = frame.iloc[:,:6]
    frame.columns = ['Time','Open','High','Low','Close','Volume']
    frame[['Open','High','Low','Close','Volume']] = frame[['Open','High','Low','Close','Volume']].astype(float)
    frame.Time = pd.to_datetime(frame.Time, unit='ms')
    return frame


for coin in coins:
    getminutedata(coin,'2').to_sql(coin, engine, index=False)

CodePudding user response:

Have you tried the pandas "drop_duplicates" method?

frame.drop_duplicates() should take take of this.

CodePudding user response:

You can pass if_exists="append" to to_sql, which will add your new table as rows below the existing table.

Source: pandas docs for to_sql

  • Related