UndefinedVariableError: name is not defined but for sample code it works-CodePudding

One dataframe looks like this (it stems from a bigger one that was sliced by workplace, therefore the square brackets):

company_grouper[0] =
group    workplace    dep    employee   answer    question
a        w1           t1     smith      True      q1
a        w1           t1     smith      False     q2
a        w1           t1     smith      True      q2
a        w1           t1     john       False     q1
a        w1           t2     joe        True      q2
b        w1           t1     don        True      q1
b        w1           t1     don        False     q2
b        w1           t2     sean       True      q3
c        w1           t2     sean       True      q5
c        w1           t3     liam       False     q5
c        w1           t1     al         True      q1

So workplace is always the same, team doesn't matter, an employee can be in multiple groups and can answer the same question multiple times. I wanted to make a statistic and compare groups two by two because not all of them deal with the same questions. So firstly:

import itertools
g_8 = company_grouper[0].groupby('group')['question'].apply(set)
rows = []
for a, b in itertools.combinations(g_8.index, 2):
    rows.append({'Group1': a,
                 'Group2': b,
                 'NumberQuestionsG1': len(g_8[a]),
                 'NumberQuestionsG2': len(g_8[b]),
                 'Q_G1_G2': len(list(set().union(g_8[a],g_8[b]))),
                 'AllQuestions': len(company_grouper[0].question.unique()),
                 'CommonQuestions': len(g_8[a] & g_8[b]),
                 'Ratio': len(g_8[a] & g_8[b]) / (len(company_grouper[0].question.unique())),
                 'Ratio_pair': len(g_8[a] & g_8[b]) / len(list(set().union(g_8[a],g_8[b])))})
output_g_8 = pd.DataFrame(rows)

The columns are unimportant for this post, the only thing that matters is that I take groups two by two, without repetitions. The above code works.

The problem is when I am trying to compute the averages for each group within each pair:

d_groups = {'Group1':'group1','Group2':'group2'}
result_8_partial = (company_grouper[0].merge(company_grouper[0], on='question', suffixes=('1','2'))
        .query('group1 != group2')
        .groupby(['group1','group2','question'], as_index=False)
        .mean())

statistic_8 = result_8_partial.merge(output_g_8[['Group1','Group2']].rename(columns=d_groups))
statistic_8_averages = statistic_8.groupby(
    ['group1', 'group2'], as_index=False
).agg(Average1=('answer1', 'mean'), Average2=('answer2', 'mean'))

I don't understand why everything that I wrote here (see sample data below) works, but it doesn't work if I use that piece of data with the notation company_grouper[8]. I get UndefinedVariableError: name 'group1' is not defined.

Here's the data to play around with:

company_grouper = pd.DataFrame({'group': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
                   'workplace': ['w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1'],
                   'team': ['t1', 't1', 't1', 't1', 't2', 't1', 't1', 't2', 't2', 't3', 't1'],
                   'employee': ['smith', 'smith', 'smith', 'john', 'joe', 'don', 'don', 'sean','sean', 'liam','al'],
                   'answer': [True, False, True, False, True, True, False, True, True, False, True],
                   'question': ['q1','q2','q2','q1','q2','q1','q2','q3','q5','q5','q1']})

EDIT: How I got to the company_grouper[0] dataframe:

df_big=
    group    workplace    dep    employee   answer    question
    a        w1           t1     smith      True      q1
    a        w1           t1     smith      False     q2
    a        w1           t1     smith      True      q2
    a        w1           t1     john       False     q1
    a        w1           t2     joe        True      q2
    b        w1           t1     don        True      q1
    b        w1           t1     don        False     q2
    b        w1           t2     sean       True      q3
    c        w1           t2     sean       True      q5
    c        w1           t3     liam       False     q5
    c        w1           t1     al         True      q1
    z        w2           t9     mary       True      q7
    z        w2           t9     mary       False     q8
    y        w2           t9     dan        False     q7
    y        w2           t8     ben        True      q9
    w        w3           t14    greg       False     q15

And then:

company_grouper = [g for _, g in df_big.groupby(['workplace'])]

CodePudding user response：

I'm not getting something, you are trying to access company_grouper by 8, but that's a data frame?

Can you post your original company_grouper, I'm expecting that it is a dictionary, or a list, an not a dataframe.

Be careful using the same name for different things.