Home > Blockchain >  Iterating pandas dataframe row pairwise
Iterating pandas dataframe row pairwise

Time:03-22

Is there a faster way to iterate Pandas data frame row pairwise to do some calculations? My code below is not fast enough. I wonder if there is Pandas workaround this.

I started with iterrows, then found itertuples faster, but still not fast enough.


def pairwisecalculate(df):
    sim = []
    for row_1 in df.itertuples():
      for row_2 in df.itertuples():
        sum = 0.
        for i, c in enumerate(df.columns):
            if row_1[i] == row_2[i]:
                sum =1
        sim.append(sum/ (len(df.columns)-1))
    return sim

CodePudding user response:

You can try:

df.rolling(2).sum() / (len(df.columns) - 1)

CodePudding user response:

You can also try to use https://www.pola.rs/ or other https://arrow.apache.org/docs/python/pandas.html Apache Arrow implantations. If you are aiming for speed.

  • Related