Home > front end >  How to speed up `for` loop on Pandas Dataframe
How to speed up `for` loop on Pandas Dataframe

Time:03-10

I have a Pandas Dataframe as shown below:

import pandas as pd
dist_temp = {'ID': {0: '110000', 1: '120000', 2: '130100', 3: '130200', 4: '130300'},
 '110000': {0: 0.0,
  1: 26359.35546663972,
  2: 7041.465835419961,
  3: 6748.88597016984,
  4: 4499.648549689056},
 '120000': {0: 25359.55546817345,
  1: 0.0,
  2: 2684.5583355637195,
  3: 16072.930316000879,
  4: 2573.60624992548},
 '130100': {0: 7203.651673447513,
  1: 2933.7008133762006,
  2: 0.0,
  3: 2278.794996954,
  4: 1022.7509126175601},
 '130200': {0: 7532.764214042125,
  1: 17598.85692679548,
  2: 2303.3769962313604,
  3: 0.0,
  4: 8356.33354580892},
 '130300': {0: 4123.111378129952,
  1: 2726.5461773558404,
  2: 1047.53770945992,
  3: 7946.337752637479,
  4: 0.0}}
df = pd.DataFrame(dist_temp)

enter image description here

Is a there a way to speed up the following routine:

df_1 = pd.DataFrame(columns = ['IDo', 'IdD', 'flux'])
for i in range(0,len(df.columns)-1):
    j =0
    for x in df.itertuples():
        df_1 = df_1.append({'IDo': df.columns[i 1], 'IdD': x[j 1],'flux': x[i 2]}, ignore_index = True)
    j = j 1
df_1['flux'] = df_1['flux'].fillna(0)
print (df_1)

enter image description here

Desired output is as shown above, where the title of the second column in this case (110000) comes in the output Dataframe as IDo in the rows (0-4) and the the first five rows (0-4) in the input Dataframe with value of 110000, 12000, 130100, 130200, 130300 comes in the output Dataframe as IdD (which is the second column in the output dataframe). And the third column in output Dataframe (flux) contributes to value corresponding to the intersecting points in the input Dataframe such as 0.000000 is the value on intersection when 110000 (title of column 1 input dataframe) and 110000 (number on second row), 26359.355467 is the value when 110000 (title of column 1 dataframe) and 120000 (number on second row) and so on.

For small Dataframe this method is not a problem. But for a 500 rows x 500 columns Dataframe it is taking enormous time.

Sorry for this naive question, I am new to Pandas.

Thankyou

CodePudding user response:

Pandas has lots of great options for reshaping a dataframe.

In this case, DataFrame.melt is handy:

df.melt(id_vars="ID", value_name="Flux", var_name="IDo")

And for the first 10 rows I get:


       ID     IDo          Flux
0  110000  110000      0.000000
1  120000  110000  26359.355467
2  130100  110000   7041.465835
3  130200  110000   6748.885970
4  130300  110000   4499.648550
5  110000  120000  25359.555468
6  120000  120000      0.000000
7  130100  120000   2684.558336
8  130200  120000  16072.930316
9  130300  120000   2573.606250
  • Related