Most efficient way to create a binary matrix of users/purchases?-CodePudding

I have data where there are N users and K possible items. The data is in the form of a dictionary like data[user] = [item1, item2, ...]. I want to take this dictionary and create an N x K matrix where the (n,k) is entry is 1 if user n has purchased this item and 0 otherwise. Below is sample data.

import random

random.seed(10)

# Users
N = list(range(10)) 

# Items represented by an integer
K = list(range(1000)) 

# I have a dict of {user: [item1, item2...itemK]} 
# where k differs by user
data = {x:random.sample(K, random.randint(1,50)) for x in N}


# Now I want to create an N x K matrix, where rows are users, columns are items, and the (n,k) entry
# is 1 if user i has item k in list and 0 otherwise.

CodePudding user response：

If I understand your question right, you can convert the list of items each user has to set and then do a test for each item.

Note: I lowered the number of items to 50 (to represent it better on screen):

import random

random.seed(10)

# Users
N = list(range(10))

# Items represented by an integer
K = list(range(50))

# I have a dict of {user: [item1, item2...itemK]}
# where k differs by user
data = {x: random.sample(K, random.randint(1, 50)) for x in N}

# create matrix:
matrix = []
for v in data.values():
    v = set(v)
    matrix.append([int(i in v) for i in K])

# print matrix:
for row in matrix:
    print(*row)

Prints (each row is different user):

1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1
1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 1

CodePudding user response：

The best possible way includes traversing each user in dictionary and each item the user has at the least.

//Assuming users are also represented by integers
mat = [[0]*N]*K //Matrix initialised to value 0
for ui in data:
    for i in data[ui]:
        mat[ui][i]=1

If the user can have repeated items, you can try-

mat = [[0]*N]*K
for ui in data:
    for i in list(set(data[ui])):
        mat[ui][i]=1