Home > Net >  Get rid of iterrows in pandas loop
Get rid of iterrows in pandas loop

Time:03-15

I'm trying to avoid using iterrows() in pandas and achieve a more performant solution. This is the code I have, where I loop through a DataFrame and for each record I need to add three more:

import pandas as pd

fruit_data = pd.DataFrame({
    'fruit':  ['apple','orange','pear','orange'],
    'color':  ['red','orange','green','green'],
    'weight': [5,6,3,4]
})

array = []

for index, row in fruit_data.iterrows():

    row2 = { 'fruit_2': row['fruit'], 'sequence': 0}
    array.append(row2)
    
    for i in range(2):
        row2 = { 'fruit_2': row['fruit'], 'sequence': i   1}
        array.append(row2)

print(array)

My real DataFrame has millions of records. Is there a way to optimize this code and NOT use iterrows() or for loops?

CodePudding user response:

You could use repeat to repeat each fruit 3 times; then groupby cumcount to assign sequence numbers; finally to_dict for the final output:

tmp = fruit_data['fruit'].repeat(3).reset_index(name='fruit_2')
tmp['sequence'] = tmp.groupby('index').cumcount()
out = tmp.drop(columns='index').to_dict('records')

Output:

[{'fruit_2': 'apple', 'sequence': 0},
 {'fruit_2': 'apple', 'sequence': 1},
 {'fruit_2': 'apple', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2},
 {'fruit_2': 'pear', 'sequence': 0},
 {'fruit_2': 'pear', 'sequence': 1},
 {'fruit_2': 'pear', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2}]

CodePudding user response:

Try this out:

array = (
    fruit_data['fruit']
    .repeat(3)
    .to_frame(name='fruit_2')
    .set_index(np.tile(np.arange(3), len(fruit_data['fruit'])))
    .reset_index()
    .rename({'index':'sequence'},axis=1)
    [['fruit_2', 'sequence']]
    .to_dict('records')
)

Output:

>>> array
[{'fruit_2': 'apple', 'sequence': 0},
 {'fruit_2': 'apple', 'sequence': 1},
 {'fruit_2': 'apple', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2},
 {'fruit_2': 'pear', 'sequence': 0},
 {'fruit_2': 'pear', 'sequence': 1},
 {'fruit_2': 'pear', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2}]
  • Related