ValueError: Length of values (1) does not match length of index (3)-CodePudding

I'm trying to run this piece of code:

 df['ID'] =df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values

with the following input:

data1 ={"Code":["A","A","A"], "Number":[7,7,7],"O":["BR","AC","BR"],"D":["AC","LF","LF"]}
df=pd.DataFrame(data1)

I get the following error, if I have only one group (on Code & Number) in the input data frame:

data = array([[ True,  True, False]])
index = Int64Index([0, 1, 2], dtype='int64')
ValueError: Length of values (1) does not match length of index (3)

If I use another input with multiple rows and groups, I don't get any errors. I don't really understand what's the problem and how can I fix it.

CodePudding user response：

You have the error because there is a single group.

Example (using a function for clarity):

def f(x):
    out = x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])
    # print(out) # uncomment to see how the groups are handled
    return out

data1 ={"Code":["A","A","A"], "Number":[7,7,7],
        "O":["BR","AC","BR"],"D":["AC","LF","LF"]}
df1 = pd.DataFrame(data1)

df1.groupby(["Code","Number"]).apply(f)

                0     1      2
Code Number                   
A    7       True  True  False

Now let's add another group:

data2 = {"Code":list('AAABBB'), "Number":[7,7,7,8,8,8],
         "O":["BR","AC","BR","BR","AC","BR"],"D":["AC","LF","LF","BR","AC","LF"]}
df2 = pd.DataFrame(data2)

df2.groupby(["Code","Number"]).apply(f)

Code  Number   
A     7       0     True
              1     True
              2    False
B     8       3     True
              4     True
              5     True
dtype: bool

You can "fix" the first output with stack:

df1.groupby(["Code","Number"]).apply(f).stack()

Code  Number   
A     7       0     True
              1     True
              2    False
dtype: bool

CodePudding user response：

Well from what i could see this line of code return the following output

df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values

[[ True False  True]]

This is actually in the following shape (1,3) if you convert it into numpy or series(which is gonna happen when you ran the following line df['Id'] = your_code) the thing is pandas gives you that error because your output returns a kinda of crooked shaped list. So all you need to do is convert it into numpy and reshape it like this.

Id = df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values

df['Id'] = np.reshape(np.array(Id),(3,1))

I am not sure if this is gonna run with your full dataset, but hey at least you can run when you have one sole row