Is there a faster way to iterate Pandas data frame row pairwise to do some calculations? My code below is not fast enough. I wonder if there is Pandas workaround this.
I started with iterrows
, then found itertuples
faster, but still not fast enough.
def pairwisecalculate(df):
sim = []
for row_1 in df.itertuples():
for row_2 in df.itertuples():
sum = 0.
for i, c in enumerate(df.columns):
if row_1[i] == row_2[i]:
sum =1
sim.append(sum/ (len(df.columns)-1))
return sim
CodePudding user response:
You can try:
df.rolling(2).sum() / (len(df.columns) - 1)
CodePudding user response:
You can also try to use https://www.pola.rs/ or other https://arrow.apache.org/docs/python/pandas.html Apache Arrow implantations. If you are aiming for speed.