What does the ValueError
message tries to tell me? What is wrong with
the code below or my expectations that it should work?
The lengths of the sliced series are the same on both sides of the assignment.
import pandas as pd
d = {'A':[1,2,3,4,5], 'B':[6,7,8,9,0], 'C':[7,8,4,2,0]}
df = pd.DataFrame(data=d)
df["D"] = 0
print( len( # shows that lengths of all slices is the same: 4 4 4 4
df["D"][1:]), len(df["A"][1:]), len(df["B"][1:]), len(df["C"][0:-1]) )
# v-- raises ValueError
df["D"][1:] = (df["A"][1:] df["B"][1:] df["C"][0:-1])
The code above outputs:
4 4 4 4
Traceback (most recent call last):
File "pandas_slicing_problem.py", line 6, in <module>
df["D"][1:] = (df["A"][1:] df["B"][1:]) * df["C"][0:-1]
[ ... many other irrelevant lines ... ]
File "...python3.9/site-packages/pandas/core/indexers/utils.py", line 187, in check_setitem_lengths
raise ValueError(
ValueError: cannot set using a slice indexer with a different length than the value
Answering the question in the comment: the expected output is:
4 4 4 4
A B C D
0 1 6 7 0
1 2 7 8 16.0
2 3 8 4 19.0
3 4 9 2 17.0
4 5 0 0 7.0
and can be obtained using:
df["D"] = df["A"] df["B"] df["C"].shift(1)
df = df.fillna(0)
BUT ... this doesn't neither explain why the code above fails nor what will the ValueError about different length tell me?
UPDATE considering the given answer:
As pointed out in an answer to my question by irahorecka the statement:
print( df["A"][1:] df["B"][1:] df["C"][0:-1] )
does not fail and gives:
0 NaN
1 17.0
2 15.0
3 15.0
4 NaN
OK ... this explains why the ValueError
message mentions different lengths: the left side of assignment has a length of 4 and the right one of 5.
Apparently my wrong expectation was that the result of summing up Series with same length must have the same length as the summed up Series.
In other words my question can be expressed as: How can it be that summing up Series with same lengths gives a Series with another length?
P.S. I have checked out similar questions here on stackoverflow ( for example: "Trying to Understand ValueError: cannot set using a slice indexer with a different length than the value" which has no answer), but they address another issues like multi-index or assigning a list which is not the case in my question.
CodePudding user response:
Executing your statement df["A"][1:] df["B"][1:] df["C"][0:-1]
gets a value:
0 NaN
1 17.0
2 15.0
3 15.0
4 NaN
dtype: float64
... which I'm not sure is what you're looking for.
Here's a way in which you can slice each column and assign the summed values to "D"
:
import pandas as pd
d = {'A':[1,2,3,4,5], 'B':[6,7,8,9,0], 'C':[7,8,4,2,0]}
df = pd.DataFrame(data=d)
df["D"] = 0
df["D"][1:] = [sum(i) for i in zip(df["A"][1:], df["B"][1:], df["C"][0:-1])]
Which outputs:
A B C D
0 1 6 7 0
1 2 7 8 16
2 3 8 4 19
3 4 9 2 17
4 5 0 0 7
CodePudding user response:
What does the ValueError
message tries to tell me? What is wrong with the code or my expectations that the code should work?
The ValueError
message tries to tell you that there is a mismatch between the shapes on the left and the right side of an assignment statement. You were excluding the possibility that this can happen because your expectations were based on a wrong assumption.
Your wrong assumption underlying your confusion was to expect that addition of Series with same length will give a same length Series as a result. You have checked the lengths of the Series on both sides of the assignment, but ... you was not aware that Pandas are not summing Series item by item but summing them using common index values.
See here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html to read:
Operations between Series ( , - , / , * , **) align values based on their associated index values – they need not be the same length. The result index will be the sorted union of the two indexes.
In other words if two Series don't have common index values the result of Series_1 Series_2 will have the length equal to the sum of the lengths of both Series without any addition operation done on the elements of this Series.
With indexing using Series_A[1:]
the index value 0 is not in the index of this series and with indexing using Series_C[0:-1]
the last index value is not in the index of Series_C. The result of the addition will be a Series with NaN values at the index value 0 and the last index and the sum for all other indices which both Series have in common:
A C == X
0 NaN # because of a not common index value
1 1 A C on index 1
2 2 A C on index 2
3 3 A C on index 3
4 NaN # because of a not common index value
As you can see from above adding two pandas Series with same length of 4 will give as result a Series with length 5. This is how Pandas works under the hood. Using index values for performing arithmetic operations like addition, subtraction, multiplication, division and exponentiation and not the absolute positions of items.
Pandas objects are less like two-dimensional arrays or like calculation tables. They are more like dictionaries with keys (being the index) representing rows and values being lists with named tuples representing columns.