Home > Mobile >  Searching for duplicate values in rows of pandas dataframes Python
Searching for duplicate values in rows of pandas dataframes Python

Time:02-26

I want to add a piece of function down to the pandas dataframe below where it shows rows that have the open,high,low,close values to be the exact same. The very last row in the dataframe is True for this case. I want to also write a piece of code that shows the number of columns that have the same exact column value, if you look at the 3rd and fourth row the low column has the values 37350 repeated twice consecutively. So I want to make a function that states the maximum number of consecutive duplicates and the indexes of the rows of where it starts and ends.

import pandas as pd
import numpy as np
import time
import datetime

A =[[1645661520000, 37352.0, 37376.5, 37352.0, 37376.0, 15.56119087], 
[1645661580000, 37376.0, 37414.0, 37376.0, 37414.0, 49.38248589], 
[1645661640000, 37414.0, 37414.0, 37350.0, 37350.0, 45.70306699], 
[1645661700000, 37350.0, 37374.0, 37350.0, 37373.5, 14.4306948], 
[1645661760000, 37373.5, 37388.0, 37373.5, 37388.0, 3.59340947], 
[1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727]]

column_names = ["Unix","Open", "High","Low", "Close", "Volume"]
df = pd.DataFrame(A, columns=column_names)
#Dates = Local_timezone(df["Unix"].to_numpy()/1000)
df.insert(1,"Date", pd.to_datetime(df["Unix"].to_numpy()/1000,unit='s'))

Expected output

Rows with all duplicate values: 6 # 1645661820000, 37388.0, 37388.0, 37388.0, 37388.0, 21.45525727

CodePudding user response:

We can use nunique

cond = df[["Open", "High","Low", "Close"]].apply(pd.Series.nunique,1).eq(1)
Out[344]: 
0    False
1    False
2    False
3    False
4    False
5     True
dtype: bool
#row = df['cond']
  • Related