I'm trying to run this piece of code:
df['ID'] =df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values
with the following input:
data1 ={"Code":["A","A","A"], "Number":[7,7,7],"O":["BR","AC","BR"],"D":["AC","LF","LF"]}
df=pd.DataFrame(data1)
I get the following error, if I have only one group (on Code & Number) in the input data frame:
data = array([[ True, True, False]])
index = Int64Index([0, 1, 2], dtype='int64')
ValueError: Length of values (1) does not match length of index (3)
If I use another input with multiple rows and groups, I don't get any errors. I don't really understand what's the problem and how can I fix it.
CodePudding user response:
You have the error because there is a single group.
Example (using a function for clarity):
def f(x):
out = x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])
# print(out) # uncomment to see how the groups are handled
return out
data1 ={"Code":["A","A","A"], "Number":[7,7,7],
"O":["BR","AC","BR"],"D":["AC","LF","LF"]}
df1 = pd.DataFrame(data1)
df1.groupby(["Code","Number"]).apply(f)
0 1 2
Code Number
A 7 True True False
Now let's add another group:
data2 = {"Code":list('AAABBB'), "Number":[7,7,7,8,8,8],
"O":["BR","AC","BR","BR","AC","BR"],"D":["AC","LF","LF","BR","AC","LF"]}
df2 = pd.DataFrame(data2)
df2.groupby(["Code","Number"]).apply(f)
Code Number
A 7 0 True
1 True
2 False
B 8 3 True
4 True
5 True
dtype: bool
You can "fix" the first output with stack
:
df1.groupby(["Code","Number"]).apply(f).stack()
Code Number
A 7 0 True
1 True
2 False
dtype: bool
CodePudding user response:
Well from what i could see this line of code return the following output
df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values
[[ True False True]]
This is actually in the following shape (1,3) if you convert it into numpy or series(which is gonna happen when you ran the following line df['Id'] = your_code) the thing is pandas gives you that error because your output returns a kinda of crooked shaped list. So all you need to do is convert it into numpy and reshape it like this.
Id = df.groupby(["Code","Number"]).apply(lambda x: x['O'].isin(x['D']) | x['D'].isin(x['O']) & (x['O'] != x['D'])).values
df['Id'] = np.reshape(np.array(Id),(3,1))
I am not sure if this is gonna run with your full dataset, but hey at least you can run when you have one sole row