Home > OS >  Python loop through rows and then calculate doesn't wok
Python loop through rows and then calculate doesn't wok

Time:11-28

What I wanted to do, is to loop through each row. If the category is "HR contacts" and it's number is smaller than 500 then keep it. Otherwise only keep 500 as part of it. My code is:

cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
for index, row in cntByUserNm.iterrows():
    print(row['Owner Name'], row['source'])
    if row['source'] == 'HR':
        if row['total number'] <= 500:
            row['keep #'] = row['total number']
            row['rest #'] = 0
        else:
            row['keep #'] = 500
            row['rest #'] = row['total number'] - 500

But this seems doesn't work, all of the keep # and rest # still remains nan. How to fix this?

for i in range(0, len(cntByUserNm)):
    print(cntByUserNm.iloc[i]['Owner Name'], cntByUserNm.iloc[i]['blizday source'])
    if cntByUserNm.iloc[i]['blizday source'] == mainCat:
        if cntByUserNm.iloc[i][befCnt] <= destiNum:
            cntByUserNm.iloc[i]['keep #'] = cntByUserNm.iloc[i][befCnt]
            cntByUserNm.iloc[i]['rest #'] = 0
        else:
            cntByUserNm.iloc[i]['keep #'] = destiNum
            cntByUserNm.iloc[i]['rest #'] = cntByUserNm.iloc[i][befCnt] - destiNum``` 

CodePudding user response:

You are updating the copy of row of the dataframe, instead of the dataframe itself. Assuming that your row index is continuous (from 0 to len(dataframe)), you can use .loc to modify directly on the dataframe.

for index, row in cntByUserNm.iterrows():
    print(row['Owner Name'], row['source'])
    if row['source'] == 'HR':
        if row['total number'] <= 500:
            cntByUserNm.loc[index, 'keep #'] = row['total number']
            cntByUserNm.loc[index, 'rest #'] = 0
        else:
            cntByUserNm.loc[index, 'keep #'] = 500
            cntByUserNm.loc[index, 'rest #'] = row['total number'] - 500

If the index is not continuous, you can get the column integer location of keep # and rest # and use .iloc

keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
    print(row['Owner Name'], row['source'])
    if row['source'] == 'HR':
        if row['total number'] <= 500:
            cntByUserNm.iloc[index, keep_idx] = row['total number']
            cntByUserNm.iloc[index, rest_idx] = 0
        else:
            cntByUserNm.iloc[index, keep_idx] = 500
            cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500

CodePudding user response:

In pandas working with vectors is faster. So I suggest:

cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
mask = (cntByUserNm.loc[:, 'source'] == 'HR') & (cntByUserNm.loc[:, 'total number'] <= 500)
cntByUserNm.loc[mask, 'keep #'] = cntByUserNm.loc[mask, 'total number']
cntByUserNm.loc[mask, 'rest #'] = 0
cntByUserNm.loc[~mask, 'keep #'] = 500
cntByUserNm.loc[~mask, 'rest #'] = cntByUserNm.loc[~mask, 'total number'] - 500

CodePudding user response:

Answer:

keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
    print(row['Owner Name'], row['source'])
    if row['source'] == 'HR':
        if row['total number'] <= 500:
            cntByUserNm.iloc[index, keep_idx] = row['total number']
            cntByUserNm.iloc[index, rest_idx] = 0
        else:
            cntByUserNm.iloc[index, keep_idx] = 500
            cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500

  • Related