Home > Enterprise >  how to add a column with a counter based on a condition
how to add a column with a counter based on a condition

Time:04-08

I have a dataset that looks like this:

cols = ['question_nummber', 'Answer', 'Avg_Score']
data=[['Q1',  'w1', 'N/A'],
['Q1','w2', 4.3],
['Q1','w3', 1.2],
['Q1','w4', 3.5],
['Q2','w5', 'N/A'],
['Q2','w6', 3.1],
['Q2','w7', 2.4],
['Q2','w8', 1.7],
['Q2','w9', 4.6],
['Q3','w10', 'N/A'],
['Q3','w11', 3.0]]

df = pd.DataFrame(data, columns = cols)

I want to add a column with the answers numbers to questions Q1 to Q3. Everytime the loop finds the string N/A column "Avg_Score", the counter has to be set to 1 back to 1. My desired output would be:

cols = ['question_nummber', 'answers-options', 'Answer', 'Avg_Score']
data=[['Q1', 'A1', 'w1', 'N/A'],
['Q1', 'A2','w2', 4.3],
['Q1','A3','w3', 1.2],
['Q1', 'A4','w4', 3.5],
['Q2','A1','w5', 'N/A'],
['Q2','A2','w6', 3.1],
['Q2','A3','w7', 2.4],
['Q2','A4','w8', 1.7],
['Q2','A5', 'w9', 4.6],
['Q3','A1','w10', 'N/A'],
['Q3','A2', 'w11', 3.0]]

df = pd.DataFrame(data, columns = cols)

I tried the below code, but it does not work because the counter is not set to 1 every time "N/A" is found. The counter continues. How can I obtain my desired output?

c=1
for x, row in df.iterrows():
  if df.loc[x, 'Avg_Score'] == 'N/A':
    df.loc[x,'question_alternative'] = 'Null'
  else:

    c=c 1
    df.loc[x,'question_alternative'] = 'A{}'.format(c)

CodePudding user response:

If each group starting by N/A values test missing values with Series.notna and create counter by GroupBy.cumsum:

df['Avg_Score'] = df['Avg_Score'].replace('N/A', np.nan)
df['new'] = 'A'   (df['Avg_Score'].notna()
                                  .groupby(df['question_nummber'])
                                  .cumsum()
                                  .add(1)
                                  .astype(str))

Alternative without missing values:

df['new'] = ('A'   df['Avg_Score'].eq('N/A')
                                  .groupby(df['question_nummber'])
                                  .cumsum()
                                  .add(1)
                                  .astype(str))

print (df)
   question_nummber answers-options Answer  Avg_Score new
0                Q1              A1     w1        NaN  A1
1                Q1              A2     w2        4.3  A2
2                Q1              A3     w3        1.2  A3
3                Q1              A4     w4        3.5  A4
4                Q2              A1     w5        NaN  A1
5                Q2              A2     w6        3.1  A2
6                Q2              A3     w7        2.4  A3
7                Q2              A4     w8        1.7  A4
8                Q2              A5     w9        4.6  A5
9                Q3              A1    w10        NaN  A1
10               Q3              A2    w11        3.0  A2

If need groups only by counter by column question_nummber (N/A is not encessary test) use GroupBy.cumcount for counter:

df['new'] = ('A'   df.groupby('question_nummber')
                     .cumcount()
                     .add(1)
                     .astype(str))
  • Related