Home > OS >  How to update column based on conditions and previous row is not equal to the same condition
How to update column based on conditions and previous row is not equal to the same condition

Time:01-16

How to identify Winner Week when the previous row is not equal to the current row. Week is classified as a "Winner" when the [Weekly_counts] is greater than the [Winner_Num] and the previous week is not a Winner.

Here a copy of the final data set:

    Year    ISOweeknum  Weekly_Counts   NumOfWeeks  Yearly_Count    WeeklyAverage   Winner_Num
0   2017    9   1561    44  12100   275 330
1   2017    10  1001    44  12100   275 330
2   2017    11  451     44  12100   275 330
3   2017    12  513     44  12100   275 330
4   2017    13  431     44  12100   275 330
... ... ... ... ... ... ... ...
232 2021    32  136     36  4212    117 140
233 2021    33  84      36  4212    117 140
234 2021    34  95      36  4212    117 140
235 2021    35  120     36  4212    117 140
236 2021    53  77      36  4212    117 140

I've tried using this code but not getting the results desired:

new_df3['Winner_Results'] = 0 
for i in range(len(new_df3)-1): 

if (new_df3['Weekly_Votes_Counts'].iloc[i] > new_df3['Winner_Num'].iloc[i]) & (new_df3['Weekly_Votes_Counts'].iloc[i 1] > new_df3['Winner_Num'].iloc[i 1]): 
     new_df3['Winner_Results'].iloc[i] = 'Not Winner'

    
else:
     new_df3['Winner_Results'].iloc[i] = 'Winner'



.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy new_df3['Winner_Results'].iloc[i] = 'Winner'


The Expected Result: 
[![Excel Example][1]][1]


  [1]: https://i.stack.imgur.com/BEcuX.png

CodePudding user response:

Here's a way to get the result in your question:

df['Counts_Gt_Num'] = df.Weekly_Counts > df.Winner_Num
df['Cumsum'] = df['Counts_Gt_Num'].cumsum()
df.loc[(~df['Counts_Gt_Num']) & df['Counts_Gt_Num'].shift(), 'Subtract'] = df['Cumsum']
df['Is_Winner'] = (df['Cumsum'] - df['Subtract'].ffill().fillna(0)).mod(2)
df['Winner_Results'] = np.where(df['Is_Winner'], 'Winner', 'Not Winner')
df = df.drop(columns=['Counts_Gt_Num', 'Cumsum', 'Subtract', 'Is_Winner'])

Output:

    Year  ISOweeknum  Weekly_Counts  NumOfWeeks  Yearly_Count  WeeklyAverage  Winner_Num Winner_Results
0   2017           9           1561          44         12100            275         330         Winner
1   2017          10           1001          44         12100            275         330     Not Winner
2   2017          11            451          44         12100            275         330         Winner
3   2017          12            513          44         12100            275         330     Not Winner
4   2017          13            431          44         12100            275         330         Winner
5   2017          14            371          44         12100            275         330     Not Winner
6   2017          15            361          44         12100            275         330         Winner
7   2017          16            336          44         12100            275         330     Not Winner
8   2017          17            332          44         12100            275         330         Winner
9   2017          18            124          44         12100            275         330     Not Winner
10  2017          19            142          44         12100            275         330     Not Winner
11  2017          20            290          44         12100            275         330     Not Winner
12  2017          21            116          44         12100            275         330     Not Winner
13  2017          22            142          44         12100            275         330     Not Winner
14  2017          23            132          44         12100            275         330     Not Winner
15  2017          24             69          44         12100            275         330     Not Winner
16  2017          25            124          44         12100            275         330     Not Winner
17  2017          26            136          44         12100            275         330     Not Winner
18  2017          27             63          44         12100            275         330     Not Winner

Explanation:

  • mark rows with counts > num using boolean in new column Counts_Gt_Num
  • put cumsum of the above in new column Cumsum
  • create new column Subtract by coping from Cumsum for rows where Counts_Gt_Num is False but was True in the preceding row, and forward fill using ffill() for rows with nan
  • create Is_Winner column selecting as winners the rows at an even offset (0, 2, 4 ...) within a streak of non-zero values in Cumsum - Subtract
  • create Winner_Results by assigning the desired win/no-win value based on Is_Winner
  • drop intermediate columns.
  • Related