Python 3.10/Pandas 1.1.3
Given this code:
import pandas as pd
data = {'a': ['AA','BB','CC','DD', 'EE', 'FF', 'GG'],
'b': [11, 22, 33, 44, 55, 66, 77],
}
df = pd.DataFrame(data, columns=['a','b'])
df.a
print (df)
which produces:
a b
0 AA 11
1 BB 22
2 CC 33
3 DD 44
4 EE 55
5 FF 66
6 GG 77
I need to understand how I can shuffle the values of column b
with the condition that the resulting dataframe is not allowed to have any b
values be associated with their original a
values.
CodePudding user response:
Use the following function to find a way to remap your column:
def derange(x):
res = x
while np.any(res == x):
res = np.random.permutation(x)
return res
Then just apply it to any column:
df['b'] = derange(df['b'])
The method is to generate permutations until one is good enough. The expected number of attempts is (n/(n-1))^n
which converges to e
very quickly.
Note that for n=1
the expectation actually tends to infinity which makes sense as you cannot derange such a list.
Derangement can also be performed deterministically so here it is, for completeness:
def derange2(x):
n = len(x)
for i in range(n - 1):
j = random.randrange(i 1, n)
x[i], x[j] = x[j], x[i]
This function actually transforms the list in-place.
You can also have a version that modifies pandas
columns in-place:
def derange3(df, col):
n = df.shape[0]
for i in range(n - 1):
j = random.randrange(i 1, n)
df.iat[i, col], df.iat[j, col] = df.iat[j, col], df.iat[i, col]
CodePudding user response:
Let us doing with numpy
def rnd(l):
l1 = l.copy()
while True:
np.random.shuffle(l1)
if any(l == l1):
break
else:
return l1
df.b = rnd(df.b.values)
CodePudding user response:
You can shuffle the index until it doesn't match the original index any more and then sort df['b']
using the new shuffled indices and assign this new array back to df['b']
:
idx = df.index.tolist()
while (idx == df.index).any():
np.random.shuffle(idx)
df['b'] = df['b'][idx].values