how to add levels to a new column in pandas dataframe based on a condition?-CodePudding

I have huge data frame that looks like this:

     Col1      Col2
0.   'w1'.     'N/A'
1.   'w2'       4.3
2.   'w3'.      1.2
4.   'w4'.      3.5
5.   'w5'      'N/A'
6.   'w6'.      3.1
7.   'w7'.      2.4
8.   'w8'.      1.7
9.   'w9'.      4.6
10.  'w10'.    'N/A'
11.  'w11'.     3.0

I have strings in the first column. In the sercond column I have floats and the 'N/A' string in some rows. The strings in col1 are answer options to a question, the floats in col2 are the answer avg. counts and the "N/A" string is the question itself. My intention is to create another column in this data frame named "Question_number" whose levels (Q1,Q2,Q3...) change every time a new 'N/A' row in col2 appears. Thus, my desired output is:

     Col1    Col2      Col3
0.   'Q1'     'w1'.    'N/A'
1.   'Q1'     'w2'      4.3
2.   'Q1'     'w3'.     1.2
4.   'Q1'     'w4'.     3.5
5.   'Q2'     'w5'     'N/A'
6.   'Q2'     'w6'.     3.1
7.   'Q2'.    'w7'.     2.4
8.   'Q2'     'w8'.     1.7
9.   'Q3'     'w9'.    'N/A'
10.  'Q3'     'w10'.    2.0
11.  'Q3'     'w11'.    3.0

Can anyone help me with this? I tried this:

df['question_number']=np.where(df['counts']=='N/A', "Q1", "Q2", 'Q3')

But that does not work and I have no idea on how to do this. Can anyone help, please?

CodePudding user response：

Update: @enke's comment is MUCH cleaner than my code. If you want to place the new column in the first position, simply add the column first.

# insert new column in first position
df.insert(0,'new','')
# Line from @enke's comment
df['new'] = 'Q' df['Col2'].eq('N/A').cumsum().astype(str)

My original post is below and you can ignore if ya wanna!

I inserted a new column in the first column position then loop through each row. I rename the column names just because I can. :D I did not include the quotes and periods in your original df. But the code below may still be useful.

import pandas as pd

cols = ['Answer', 'Avg_Score']
data=[['w1', 'N/A'],
['w2', 4.3],
['w3', 1.2],
['w4', 3.5],
['w5', 'N/A'],
['w6', 3.1],
['w7', 2.4],
['w8', 1.7],
['w9', 4.6],
['w10', 'N/A'],
['w11', 3.0]]

df = pd.DataFrame(data, columns = cols)
# insert new column before the Answer and the Avg_Score
df.insert(0,'Question','')

# start the question counter at 0
qnum = 0

# loop through each row
for index,row in df.iterrows():
    # if 'N/A' found increase the question counter
    # this assume first row will always have an 'N/A'
    if df.loc[index,'Avg_Score'] == 'N/A':
        qnum  = 1
    df.loc[index,'Question'] = 'Q{}'.format(qnum)

print(df)

Ouput:

   Question Answer Avg_Score
0        Q1     w1       N/A
1        Q1     w2       4.3
2        Q1     w3       1.2
3        Q1     w4       3.5
4        Q2     w5       N/A
5        Q2     w6       3.1
6        Q2     w7       2.4
7        Q2     w8       1.7
8        Q2     w9       4.6
9        Q3    w10       N/A
10       Q3    w11       3.0