Home > database >  Generating a Curse of Dimensionality Chart
Generating a Curse of Dimensionality Chart

Time:09-22

I'm trying to demonstrate the curse of dimensionality in Python. Here's how the curve should look like:

enter image description here

I want to do this for dimension from 2 to 30 with a step size of 1 and for each dimension I want to generate 100 random data points. Can anyone let me know how can I go about doing this? Here's my current code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import math
from random import randint

Below, I'm generating random values to variables x and y and then I calculate the Euclidean distance using np.linalg.norm. After that, I store the value in a list.

x=np.array([randint(0, 101),randint(0, 101)])
y=np.array([randint(0, 101),randint(0, 101)])
ed_list = []
d=np.array([1,2])
ed = np.linalg.norm(x - y)
ed_list.append(ed)
ed_list
print(x)

Now, I use a for loop to generate the more random values and then store and plot it.

for i in range(2,50):
    xval = randint(0, 101)
    yval = randint(0, 101)
    x = np.append(x,xval)
    y = np.append(y,yval)
    d= np.append(d,i 1)
    try:
        ed = np.linalg.norm(x - y)
        ed_list.append(ed)
        ed = 0
    except:
        ed = 0
    print(x, y)

plt.plot(ed_list)
plt.xlabel('Number of dimensions')
plt.ylabel('Euclidean Distance')
plt.show()

However, my curve looks like this because I'm just plotting the Euclidean distance and not calculating according to the formula in the above Curse of Dimensionality picture.

enter image description here

So I have 2 questions. How can add the formula that is given in the curse of dimensionality picture. Also, how do I do this for dimension 2 to 30 and for a random 100 data points as I have stated above. If anyone can help me on this, that would be great!

CodePudding user response:

All you are showing is the distance between two points as you add extra dimensions. Naturally, that will keep growing larger and larger. The curse of dimensionality says that, given a whole set of points, P, and a reference point, Q, if you compute the distance from Q to each of the points in P, the difference between the maximum distance and the minimum distance goes to 0. You need WAY more than 2 points to show that.

For each dimension, generate a bunch of points (I picked 10000), pick another random point, compute the distance to each, grab the min and max from those distances, and plot the result.

import numpy as np
import matplotlib.pyplot as plt
import os
import math

deltas = []
for N in range(2,50):
    # Generate 1000 random points in N dimensions.
    P = [np.random.randint(-100, 100, N) for _ in range(10000)]
    Q = np.random.randint(-100,100,N)
    diffs = [np.linalg.norm(p-Q) for p in P]
    mxd = max(diffs)
    mnd = min(diffs)
    delta = math.log10(mxd-mnd)/mnd
    deltas.append( delta )

plt.plot(range(2,50),deltas)
plt.xlabel('Number of dimensions')
plt.ylabel('Euclidean Distance')
plt.show()

Result: enter image description here

  • Related