Home > Software engineering >  Combine different length of groupby data into Dataframe and how to solve IndexError: list index out
Combine different length of groupby data into Dataframe and how to solve IndexError: list index out

Time:11-06

I would like to restructure groupby data into dataframe. Because the table has different length of groupbyed data, when I iterate those groupbyed data, IndexError: list index out of range comes up and I do not know how to ignore this index error.

Background of my data.

  • I groupbyed by "Worker_id" and "Project_id"
Question Answer Worker_ID Project_ID
DD AB X Y
DD AB X Y
DD AB X Y
BD BG K Y
BD BG K Y
KY GG J Y
KY GG J Y
KY GG J Y
KY GG J Y
RR FR X Q
HU RT K Q
HU RT K Q
HU RT K Q
YU GE J Q
YU GE J Q
XX FF K P
XX FF K P
XI UF J P
XI UF J P

(This table goes on much longer)

I would like to create dataframe like this↓↓

Question_1 Answer_1 Worker_ID_1 Question_2 Answer_2 Worker_ID_2 Question_3 Answer_3 Worker_ID_3 Project_ID
DD AB X BD BG K KY GG J Y
DD AB X BD BG K KY GG J Y
DD AB X Blank Blank Blank KY GG J Y
Blank Blank Blank Blank Blank Blank KY GG J Y
RR FR X HU RT K YU GE J Q
Blank Blank Blank HU RT K YU GE J Q
Blank Blank Blank HU RT K Blank Blank Blank Q
Blank Blank Blank XX FF K XI UF J P
Blank Blank Blank XX FF K XI UF J P

(This table goes on much longer)

  • Because each groupbyed data has different length, the index error comes up when I try with the longest list in my code.

My code↓↓

#groupby by "Worker_ID","Project_ID"
grouped_questions = {}
for x, y in df1.groupby(["Worker_ID","Project_ID"],as_index=True):
    grouped_questions[x] = y.reset_index(drop = True)

#create lists of keys
unique_list = []
for unique in list(grouped_questions):
    if unique not in unique_list:
        unique_list.append(unique)

question_1 = []
question_2 = []
question_3 = []

for qn in unique_list:
    worker = qn[2]
    project = qn[3]
    if worker == 'X':
        question_1.append(qn)
    elif worker == 'K':
        question_2.append(qn)
    elif worker == 'J':
        question_3.append(qn)

#combine into dataframe 

final_df1 = pd.DataFrame()
for index in range(1,max_length):
   
    one_question = grouped_questions[question_1[index]]
    two_question = grouped_questions[question_2[index]]
    three_question = grouped_questions[question_3[index]]
   merged_df1 = one_question.merge(two_question, how='outer', left_index=True, right_index=True, suffixes=["_1", "_2"]) 
    merged_df2 = three_question.merge(four_question, how='outer', left_index=True, right_index=True, suffixes=["_3", "_4"])  

    if final_df1.shape[0] == 0:
        final_df1 = merged_df5
    else:
        final_df1 = pd.concat([final_df1, merged_df3],ignore_index=True)
final_df1.reset_index(drop=True)
final_df1

CodePudding user response:

Here is what I have so far. As I said in the comments, there are more rows than in your desired output, but I don't know on what conditions I should join them.

# restructure the df with `pivot_table`
m1 = df.groupby('Project_ID').cumcount()   1
m2 = pd.factorize(df['Worker_ID'])[0]   1
out = (
    df
    .pivot_table(
        index=['Project_ID', m1], 
        columns=m2, 
        values=['Worker_ID', 'Question', 'Answer'], 
        aggfunc='first'
    )
)


# join the multiindex columns to single columns
out.columns = out.columns.map(lambda x: f"{x[0]}_{x[1]}")

# in case you need to automate the sorting of the columns, here is how to do it (otherwise you could just pass a list with the new order of the columns) 
# 1st sort by numbers, then always bring them in the right order (question, answer, worker)
pattern = '|'.join([f"({x})" for x in ['Question', 'Answer', 'Worker']])
def sort_key(x):
    return re.search(pattern,x).lastindex
new_order = sorted(out.columns.tolist(), key=lambda x: (int(x[-1]), sort_key(x.split('_')[0])))

out = out[new_order]
print(out)
             Question_1 Answer_1 Worker_ID_1 Question_2 Answer_2 Worker_ID_2 Question_3 Answer_3 Worker_ID_3
Project_ID                                                                                                  
P          1        NaN      NaN         NaN         XX       FF           K        NaN      NaN         NaN
           2        NaN      NaN         NaN         XX       FF           K        NaN      NaN         NaN
           3        NaN      NaN         NaN        NaN      NaN         NaN         XI       UF           J
           4        NaN      NaN         NaN        NaN      NaN         NaN         XI       UF           J
Q          1         RR       FR           X        NaN      NaN         NaN        NaN      NaN         NaN
           2        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           3        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           4        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           5        NaN      NaN         NaN        NaN      NaN         NaN         YU       GE           J
           6        NaN      NaN         NaN        NaN      NaN         NaN         YU       GE           J
Y          1         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           2         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           3         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           4        NaN      NaN         NaN         BD       BG           K        NaN      NaN         NaN
           5        NaN      NaN         NaN         BD       BG           K        NaN      NaN         NaN
           6        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           7        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           8        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           9        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J

  • Related