I have a dataset that looks like this:
cols = ['question_nummber', 'Answer', 'Avg_Score']
data=[['Q1', 'w1', 'N/A'],
['Q1','w2', 4.3],
['Q1','w3', 1.2],
['Q1','w4', 3.5],
['Q2','w5', 'N/A'],
['Q2','w6', 3.1],
['Q2','w7', 2.4],
['Q2','w8', 1.7],
['Q2','w9', 4.6],
['Q3','w10', 'N/A'],
['Q3','w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
I want to add a column with the answers numbers to questions Q1 to Q3. Everytime the loop finds the string N/A column "Avg_Score", the counter has to be set to 1 back to 1. My desired output would be:
cols = ['question_nummber', 'answers-options', 'Answer', 'Avg_Score']
data=[['Q1', 'A1', 'w1', 'N/A'],
['Q1', 'A2','w2', 4.3],
['Q1','A3','w3', 1.2],
['Q1', 'A4','w4', 3.5],
['Q2','A1','w5', 'N/A'],
['Q2','A2','w6', 3.1],
['Q2','A3','w7', 2.4],
['Q2','A4','w8', 1.7],
['Q2','A5', 'w9', 4.6],
['Q3','A1','w10', 'N/A'],
['Q3','A2', 'w11', 3.0]]
df = pd.DataFrame(data, columns = cols)
I tried the below code, but it does not work because the counter is not set to 1 every time "N/A" is found. The counter continues. How can I obtain my desired output?
c=1
for x, row in df.iterrows():
if df.loc[x, 'Avg_Score'] == 'N/A':
df.loc[x,'question_alternative'] = 'Null'
else:
c=c 1
df.loc[x,'question_alternative'] = 'A{}'.format(c)
CodePudding user response:
If each group starting by N/A
values test missing values with Series.notna
and create counter by GroupBy.cumsum
:
df['Avg_Score'] = df['Avg_Score'].replace('N/A', np.nan)
df['new'] = 'A' (df['Avg_Score'].notna()
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
Alternative without missing values:
df['new'] = ('A' df['Avg_Score'].eq('N/A')
.groupby(df['question_nummber'])
.cumsum()
.add(1)
.astype(str))
print (df)
question_nummber answers-options Answer Avg_Score new
0 Q1 A1 w1 NaN A1
1 Q1 A2 w2 4.3 A2
2 Q1 A3 w3 1.2 A3
3 Q1 A4 w4 3.5 A4
4 Q2 A1 w5 NaN A1
5 Q2 A2 w6 3.1 A2
6 Q2 A3 w7 2.4 A3
7 Q2 A4 w8 1.7 A4
8 Q2 A5 w9 4.6 A5
9 Q3 A1 w10 NaN A1
10 Q3 A2 w11 3.0 A2
If need groups only by counter by column question_nummber
(N/A
is not encessary test) use GroupBy.cumcount
for counter:
df['new'] = ('A' df.groupby('question_nummber')
.cumcount()
.add(1)
.astype(str))