Home > Blockchain >  Shuffle dataframe values on condition that no element appears in its original position (derangement)
Shuffle dataframe values on condition that no element appears in its original position (derangement)

Time:12-03

Python 3.10/Pandas 1.1.3

Given this code:

import pandas as pd
 
data = {'a': ['AA','BB','CC','DD', 'EE', 'FF', 'GG'],
        'b': [11, 22, 33, 44, 55, 66, 77],
        }
 
df = pd.DataFrame(data, columns=['a','b'])
df.a

print (df)

which produces:

    a   b
0  AA  11
1  BB  22
2  CC  33
3  DD  44
4  EE  55
5  FF  66
6  GG  77

I need to understand how I can shuffle the values of column b with the condition that the resulting dataframe is not allowed to have any b values be associated with their original a values.

CodePudding user response:

Use the following function to find a way to remap your column:

def derange(x):
  res = x
  while np.any(res == x):
    res = np.random.permutation(x)
  return res

Then just apply it to any column:

df['b'] = derange(df['b'])

The method is to generate permutations until one is good enough. The expected number of attempts is (n/(n-1))^n which converges to e very quickly.

Note that for n=1 the expectation actually tends to infinity which makes sense as you cannot derange such a list.

Derangement can also be performed deterministically so here it is, for completeness:

def derange2(x):
  n = len(x)
  for i in range(n - 1):
    j = random.randrange(i   1, n)
    x[i], x[j] = x[j], x[i]

This function actually transforms the list in-place.

You can also have a version that modifies pandas columns in-place:

def derange3(df, col):
  n = df.shape[0]
  for i in range(n - 1):
    j = random.randrange(i   1, n)
    df.iat[i, col], df.iat[j, col] = df.iat[j, col], df.iat[i, col]

CodePudding user response:

Let us doing with numpy

def rnd(l):
    l1 = l.copy()
    while True:
        np.random.shuffle(l1)
        if any(l == l1):
            break
        else:
            return l1
        
df.b = rnd(df.b.values)

CodePudding user response:

You can shuffle the index until it doesn't match the original index any more and then sort df['b'] using the new shuffled indices and assign this new array back to df['b']:

idx = df.index.tolist()
while (idx == df.index).any():
    np.random.shuffle(idx)
        
df['b'] = df['b'][idx].values
  • Related