What I wanted to do, is to loop through each row. If the category is "HR contacts" and it's number is smaller than 500 then keep it. Otherwise only keep 500 as part of it. My code is:
cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
row['keep #'] = row['total number']
row['rest #'] = 0
else:
row['keep #'] = 500
row['rest #'] = row['total number'] - 500
But this seems doesn't work, all of the keep #
and rest #
still remains nan
. How to fix this?
for i in range(0, len(cntByUserNm)):
print(cntByUserNm.iloc[i]['Owner Name'], cntByUserNm.iloc[i]['blizday source'])
if cntByUserNm.iloc[i]['blizday source'] == mainCat:
if cntByUserNm.iloc[i][befCnt] <= destiNum:
cntByUserNm.iloc[i]['keep #'] = cntByUserNm.iloc[i][befCnt]
cntByUserNm.iloc[i]['rest #'] = 0
else:
cntByUserNm.iloc[i]['keep #'] = destiNum
cntByUserNm.iloc[i]['rest #'] = cntByUserNm.iloc[i][befCnt] - destiNum```
CodePudding user response:
You are updating the copy of row of the dataframe, instead of the dataframe itself. Assuming that your row index is continuous (from 0 to len(dataframe)), you can use .loc
to modify directly on the dataframe.
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.loc[index, 'keep #'] = row['total number']
cntByUserNm.loc[index, 'rest #'] = 0
else:
cntByUserNm.loc[index, 'keep #'] = 500
cntByUserNm.loc[index, 'rest #'] = row['total number'] - 500
If the index is not continuous, you can get the column integer location of keep #
and rest #
and use .iloc
keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500
CodePudding user response:
In pandas working with vectors is faster. So I suggest:
cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
mask = (cntByUserNm.loc[:, 'source'] == 'HR') & (cntByUserNm.loc[:, 'total number'] <= 500)
cntByUserNm.loc[mask, 'keep #'] = cntByUserNm.loc[mask, 'total number']
cntByUserNm.loc[mask, 'rest #'] = 0
cntByUserNm.loc[~mask, 'keep #'] = 500
cntByUserNm.loc[~mask, 'rest #'] = cntByUserNm.loc[~mask, 'total number'] - 500
CodePudding user response:
Answer:
keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500