I have some code that processes some input text by splitting it up:
text = get_data_from_internet() # or read it from a file, whatever
a, b, c = text.split('|')
Usually, this works fine, but occasionally I will get an error message that looks like
ValueError: not enough values to unpack (expected 3, got 1)
If I instead try to get a single result from the split, like so:
first = text.split()[0]
then similarly it seems to work sometimes, but other times I get
IndexError: list index out of range
What is going on? I assume it has something to do with the data, but how can I understand the problem and fix it?
This question is intended as a canonical for common debugging questions. It is meant to explain primarily what the error message means and specifically what about the input string causes the problem. Questions like this are usually not caused by a typo; they are asked by people who need something explained.
CodePudding user response:
The problem is that the result from .split
does not have enough items in it. .split
will produce a list of strings, depending on the input; the length of that list depends on the input string, and not on any surrounding code.
When you write a, b, c = text.split('|')
, the .split
method does not know that three values are expected; it gives the appropriate number of results depending on how many |
s there are, and then an error occurs. In this case, the error is ValueError
, because there is something wrong with the value: the list doesn't have enough items.
When you write first = text.split()[0]
, there may be a problem with the index (the [0]
in that code), causing an IndexError
. The cause is the same: the list doesn't have enough items. (For an empty list, even [0]
cannot be used as an index.) See also: Does "IndexError: list index out of range" when trying to access the N'th item mean that my list has less than N items?.
As a special note: when .split
is called with a delimiter (like the first example), the resulting list will have at least one item in it. This happens even if the input is an empty string:
>>> ''.split(',')
['']
However, this is not the case when using .split
without a delimiter (special-case behaviour to split on any sequence of whitespace):
>>> ''.split()
[]
To solve the problem:
- Make sure you are using the right tools for the job. For example, if your input is a
.csv
file, you should not try to write the code yourself to split the data into cells. Instead, use thecsv
standard library module. - Carefully check the input to figure out where the problem is occurring. Code like this is usually inside a loop that handles a line of text at a time; check what lines appear in the data to cause the problem. You can do this by, for example, using exception handling and logging. See https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ for more general advice on debugging problems.
- Decide what should happen when the bad input appears. Often, it is appropriate to just skip a blank line in the input. Sometimes you can fill in dummy values for whatever is missing. Sometimes you should report your own exception, etc. It is up to you to think about the specific context of your code and decide on the right course of action.
CodePudding user response:
Here are some techniques for solving the problem:
Avoiding IndexError
We can manually check that the output of .split
is of the correct length:
### here we want the third item
out = 'a,b'.split(',')
# ['a', 'b']
# we want the third item
n = 2
if len(out) > n:
out = out[n]
else:
out = None
# checking we have None
out == None
# True
If you specifically want the first item of a list, but the list might be empty, you can use this trick:
out = ''.split()
# []
# get the first item or None
next(iter(out), None)
# None
See Python idiom to return first item or None for details.
Avoiding ValueError
Since we don't know in advance how many items will be returned by split
, here is a way to pad them with a default value (None
, in this example) and ignore any extra values:
from itertools import chain, repeat
a,b,c,d,*_ = chain('A,B'.split(','), repeat(None, 4))
print(a, b, c, d)
# A B None None
We use itertools.repeat
to create as many None
values as might be needed, and itertools.chain
to append them to the .split
result.
The extended unpacking technique absorbs any extra None
s into the _
variable (not a special name, just convention). This also ignores extra values from the input. For example:
a, b, c, *_ = 'A,B,C,D,E'.split(',')
print(a, b, c)
# A B C
We can use a variation of that technique to get only the last three elements:
*_, x, y, z = 'A,B,C,D,E'.split(',')
print(x, y, z)
# C D E
Or a combination of both leading and trailing items (note that only one starred value is allowed in the expression):
a, b, *_, z = 'A,B,C,D,E'.split(',')
print(a, b, z)
# A B E