How to replace a classic for loop with df.iterrows()?-CodePudding

I have a huge data frame.

I am using a for loop in the below sample code:

for i in range(1, len(df_A2C), 1):
    A2C_TT= df_A2C.loc[(df_A2C['TO_ID'] == i)].sort_values('DURATION_H').head(1)
    if A2C_TT.size > 0:
        print (A2C_TT)

This is working fine but I want to use df.iterrows() since it will help me to automaticall avoid empty frame issues.

I want to iterate through TO_ID and looking for minimum values accordingly.

How should I replace my classical i loop counter with df.iterrows()?

Sample Data:

FROM_ID TO_ID   DURATION_H  DIST_KM
1   7   0.528555556 38.4398
2   26  0.512511111 37.38515
3   71  0.432452778 32.57571
4   83  0.599486111 39.26188
5   98  0.590516667 35.53107
6   108 1.077794444 76.79874
7   139 0.838972222 58.86963
8   146 1.185088889 76.39174
9   158 0.625872222 45.6373
10  208 0.500122222 31.85239
11  209 0.530916667 29.50249
12  221 0.945444444 62.69099
13  224 1.080883333 66.06291
14  240 0.734269444 48.1778
15  272 0.822875    57.5008
16  349 1.171163889 76.43536
17  350 1.080097222 71.16137
18  412 0.503583333 38.19685
19  416 1.144961111 74.35502

CodePudding user response：

As far as I understand your question, you want to group your data by To_ID and select the row where Duration_H is the smallest? Is that right?

df.loc[df.groupby('TO_ID').DURATION_H.idxmin()]

CodePudding user response：

here is one way about it

# run the loop for as many unique TO_ID you have
# instead of iterrows, which runs for all the DF or running to the size of DF

for idx  in np.unique(df['TO_ID']):
    A2C_TT= df.loc[(df['TO_ID'] == idx)].sort_values('DURATION_H').head(1)
        print (A2C_TT)

        ROM_ID  TO_ID  DURATION_H   DIST_KM
498660      39      7    0.434833  25.53808

here is another way about it

df.loc[df['DURATION_H'].eq(df.groupby('TO_ID')['DURATION_H'].transform(min))]

ROM_ID  TO_ID   DURATION_H  DIST_KM
498660  39  7   0.434833    25.53808