Home > Blockchain >  Most efficient method of retrieving the last row of any size dataframe
Most efficient method of retrieving the last row of any size dataframe

Time:10-29

I have a continually growing dataframe and periodically I want to retrieve the last row.

# dbdf.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6652 entries, 2022-10-23 17:15:00-04:00 to 2022-10-28 08:06:00-04:00
Freq: T
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    6592 non-null   float64
 1   high    6592 non-null   float64
 2   low     6592 non-null   float64
 3   close   6592 non-null   float64
dtypes: float64(4)
memory usage: 259.8 KB

The dataframe doesn't occupy a very large memory footprint, but notwithstanding, I'd like to understand the most efficient method of retrieving the last row to the extent that I can then call .to_dicts() on that last row.

I can certainly do something naive like:

bars = dbdf.to_dict(orient="records")
print(bars[-1])

And in this particular case it would likely be just fine given the small size of the dataframe, but if the dataframe was orders of magnitude larger in memory footprint and rows, is there a better way to achieve the same that could also be considered a best common practise regardless as to the dataframe's footprint?

CodePudding user response:

First select last row by DataFrame.iloc and then convert to dictionary by Series.to_dict:

d = df.iloc[-1].to_dict()

CodePudding user response:

There are 2 ways:

  1. Use Tail Function

The tail Function is used to show the last rows from the dataFrame. specifying the number 1 will show the last row of df.

df.tail(1)
  1. Use Iloc Function.

iloc is an indexed-based selection technique which means that we have to pass integer index in the method to select a specific row/column

df.iloc[-1]
  • Related