Combine different length of groupby data into Dataframe and how to solve IndexError: list index out-CodePudding

I would like to restructure groupby data into dataframe. Because the table has different length of groupbyed data, when I iterate those groupbyed data, IndexError: list index out of range comes up and I do not know how to ignore this index error.

Background of my data.

I groupbyed by "Worker_id" and "Project_id"

Question	Answer	Worker_ID	Project_ID
DD	AB	X	Y
DD	AB	X	Y
DD	AB	X	Y
BD	BG	K	Y
BD	BG	K	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
RR	FR	X	Q
HU	RT	K	Q
HU	RT	K	Q
HU	RT	K	Q
YU	GE	J	Q
YU	GE	J	Q
XX	FF	K	P
XX	FF	K	P
XI	UF	J	P
XI	UF	J	P

(This table goes on much longer)

I would like to create dataframe like this↓↓

Question_1	Answer_1	Worker_ID_1	Question_2	Answer_2	Worker_ID_2	Question_3	Answer_3	Worker_ID_3	Project_ID
DD	AB	X	BD	BG	K	KY	GG	J	Y
DD	AB	X	BD	BG	K	KY	GG	J	Y
DD	AB	X	Blank	Blank	Blank	KY	GG	J	Y
Blank	Blank	Blank	Blank	Blank	Blank	KY	GG	J	Y
RR	FR	X	HU	RT	K	YU	GE	J	Q
Blank	Blank	Blank	HU	RT	K	YU	GE	J	Q
Blank	Blank	Blank	HU	RT	K	Blank	Blank	Blank	Q
Blank	Blank	Blank	XX	FF	K	XI	UF	J	P
Blank	Blank	Blank	XX	FF	K	XI	UF	J	P

(This table goes on much longer)

Because each groupbyed data has different length, the index error comes up when I try with the longest list in my code.

My code↓↓

#groupby by "Worker_ID","Project_ID"
grouped_questions = {}
for x, y in df1.groupby(["Worker_ID","Project_ID"],as_index=True):
    grouped_questions[x] = y.reset_index(drop = True)

#create lists of keys
unique_list = []
for unique in list(grouped_questions):
    if unique not in unique_list:
        unique_list.append(unique)

question_1 = []
question_2 = []
question_3 = []

for qn in unique_list:
    worker = qn[2]
    project = qn[3]
    if worker == 'X':
        question_1.append(qn)
    elif worker == 'K':
        question_2.append(qn)
    elif worker == 'J':
        question_3.append(qn)

#combine into dataframe 

final_df1 = pd.DataFrame()
for index in range(1,max_length):
   
    one_question = grouped_questions[question_1[index]]
    two_question = grouped_questions[question_2[index]]
    three_question = grouped_questions[question_3[index]]
　　 merged_df1 = one_question.merge(two_question, how='outer', left_index=True, right_index=True, suffixes=["_1", "_2"]) 
    merged_df2 = three_question.merge(four_question, how='outer', left_index=True, right_index=True, suffixes=["_3", "_4"])  

    if final_df1.shape[0] == 0:
        final_df1 = merged_df5
    else:
        final_df1 = pd.concat([final_df1, merged_df3],ignore_index=True)
final_df1.reset_index(drop=True)
final_df1

CodePudding user response：

Here is what I have so far. As I said in the comments, there are more rows than in your desired output, but I don't know on what conditions I should join them.

# restructure the df with `pivot_table`
m1 = df.groupby('Project_ID').cumcount()   1
m2 = pd.factorize(df['Worker_ID'])[0]   1
out = (
    df
    .pivot_table(
        index=['Project_ID', m1], 
        columns=m2, 
        values=['Worker_ID', 'Question', 'Answer'], 
        aggfunc='first'
    )
)


# join the multiindex columns to single columns
out.columns = out.columns.map(lambda x: f"{x[0]}_{x[1]}")

# in case you need to automate the sorting of the columns, here is how to do it (otherwise you could just pass a list with the new order of the columns) 
# 1st sort by numbers, then always bring them in the right order (question, answer, worker)
pattern = '|'.join([f"({x})" for x in ['Question', 'Answer', 'Worker']])
def sort_key(x):
    return re.search(pattern,x).lastindex
new_order = sorted(out.columns.tolist(), key=lambda x: (int(x[-1]), sort_key(x.split('_')[0])))

out = out[new_order]
print(out)

             Question_1 Answer_1 Worker_ID_1 Question_2 Answer_2 Worker_ID_2 Question_3 Answer_3 Worker_ID_3
Project_ID                                                                                                  
P          1        NaN      NaN         NaN         XX       FF           K        NaN      NaN         NaN
           2        NaN      NaN         NaN         XX       FF           K        NaN      NaN         NaN
           3        NaN      NaN         NaN        NaN      NaN         NaN         XI       UF           J
           4        NaN      NaN         NaN        NaN      NaN         NaN         XI       UF           J
Q          1         RR       FR           X        NaN      NaN         NaN        NaN      NaN         NaN
           2        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           3        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           4        NaN      NaN         NaN         HU       RT           K        NaN      NaN         NaN
           5        NaN      NaN         NaN        NaN      NaN         NaN         YU       GE           J
           6        NaN      NaN         NaN        NaN      NaN         NaN         YU       GE           J
Y          1         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           2         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           3         DD       AB           X        NaN      NaN         NaN        NaN      NaN         NaN
           4        NaN      NaN         NaN         BD       BG           K        NaN      NaN         NaN
           5        NaN      NaN         NaN         BD       BG           K        NaN      NaN         NaN
           6        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           7        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           8        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J
           9        NaN      NaN         NaN        NaN      NaN         NaN         KY       GG           J

Question	Answer	Worker_ID	Project_ID
DD	AB	X	Y
DD	AB	X	Y
DD	AB	X	Y
BD	BG	K	Y
BD	BG	K	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
RR	FR	X	Q
HU	RT	K	Q
HU	RT	K	Q
HU	RT	K	Q
YU	GE	J	Q
YU	GE	J	Q
XX	FF	K	P
XX	FF	K	P
XI	UF	J	P
XI	UF	J	P

Question	Answer	Worker_ID	Project_ID
DD	AB	X	Y
DD	AB	X	Y
DD	AB	X	Y
BD	BG	K	Y
BD	BG	K	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
RR	FR	X	Q
HU	RT	K	Q
HU	RT	K	Q
HU	RT	K	Q
YU	GE	J	Q
YU	GE	J	Q
XX	FF	K	P
XX	FF	K	P
XI	UF	J	P
XI	UF	J	P

Question	Answer	Worker_ID	Project_ID
DD	AB	X	Y
DD	AB	X	Y
DD	AB	X	Y
BD	BG	K	Y
BD	BG	K	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
KY	GG	J	Y
RR	FR	X	Q
HU	RT	K	Q
HU	RT	K	Q
HU	RT	K	Q
YU	GE	J	Q
YU	GE	J	Q
XX	FF	K	P
XX	FF	K	P
XI	UF	J	P
XI	UF	J	P