I am not sure what is wrong in the following code snippet.
I have the following two versions of a function.
Version 1
def _check_array_lengths(self, data):
for i, values in data.items():
if i == 0:
length = len(values)
if length != len(values):
raise ValueError('All values must be the same length')
When I run test, the above function fails with a msg "ERROR tests/test_dataframe.py - UnboundLocalError: local variable 'length' referenced before assignment"
Version 2
def _check_array_lengths(self, data):
for i, values in enumerate(data.values()):
if i == 0:
length = len(values)
if length != len(values):
raise ValueError('All values must be the same length')
The test for this function works fine and I wonder why I don't see the same error msg(mentioned above) here. How that "enumerate" is causing this change in behavior!
May be something really silly but I couldn't figure it out yet.
Here is my test function
def test_array_length(self):
with pytest.raises(ValueError):
pdc.DataFrame({'a': np.array([1, 2]),
'b': np.array([1])})
can you please help ?
CodePudding user response:
In the second version, the first value of i
is guaranteed to be 0
. So the condition if i == 0:
will be true, and length
will be set. Then the comparison if length != len(values):
will be able to use the length
variable.
In the first version is i
iterates over the dataframe indexes, not a numeric sequence. The values of i
will be 'a'
and 'b'
. The if i == 0:
condition will never be true, so you never set length
, and get an error when you try to compare it.