Home > other >  encode a list of strings to integers
encode a list of strings to integers

Time:03-09

I have a list of strings:

l = ["pear", "apple", "pear", "banana"]

and would like to code each element as an integer, starting from 0 to the number of unique elements in the list, so the result would be

[0,1,0,2]

I can only think of complicated solutions, is there an easy one-liner for this?

CodePudding user response:

No, not a one-liner, but a very simple solution:

>>> idx = {}
>>> for x in l:
...     if x not in idx:
...         idx[x] = len(idx)
...
>>> idx
{'pear': 0, 'apple': 1, 'banana': 2}
>>> [idx[x] for x in l]
[0, 1, 0, 2]

A more compact way to get idx, although, I think the above is easier to read/understand, but basically the same:

>>> idx = {}
>>> for x in l:
...     idx[x] = idx.get(x, len(idx))
...
>>> idx
{'pear': 0, 'apple': 1, 'banana': 2}

CodePudding user response:

You can create a dictionary that maps each string to an id, then use list() and map() to map each string to its corresponding id. You could condense the dictionary index generation into one line using .index(), but it would likely be slower (since index would perform a linear search starting at the start of the list for each element, rather than constructing everything in one pass).

data = ["pear", "apple", "pear", "banana"]

indices = {}
count = 0
for item in data:
    if item not in indices:
        indices[item] = count
        count  = 1
        
result = list(map(lambda x: indices[x], data))
print(result)

CodePudding user response:

A one line solution using numpy:

>>> import numpy as np
>>> l = ["pear", "apple", "pear", "banana"]

>>> [np.where(np.array(list(dict.fromkeys(l)))==e)[0][0]for e in l]
[0, 1, 0, 2]
  • Related