How can I swap half of two columns in a pandas dataframe in Python?-CodePudding

I am trying to create a machine learning model and teaching myself as I go. I will be working with a large dataset, but before I get to that, I am practicing with a smaller dataset to make sure everything is working as expected. I will need to swap half of the rows of two columns in my dataset, and I am not sure how to accomplish this.

Say I have a dataframe like the below:

index	number	letter
0	1	A
1	2	B
2	3	C
3	4	D
4	5	E
5	6	F

I want to randomly swap half of the rows of the number and letter columns, so one output could look like this:

index	number	letter
0	1	A
1	B	2
2	3	C
3	D	4
4	5	E
5	F	6

Is there a way to do this in python?

edit: thank you for all of your answers, I greatly appreciate it! :)

CodePudding user response：

Here's one way to implement this.

import pandas as pd
from random import sample

df = pd.DataFrame({'index':range(6),'number':range(1,7),'letter':[*'ABCDEF']}).set_index('index')

n = len(df)
idx = sample(range(n),k=n//2)                  # randomly select which rows to switch
df = df.iloc[idx,:] = df.iloc[idx,::-1].values # switch those rows

An example result is

      number letter
index              
0          1      A
1          2      B
2          C      3
3          4      D
4          E      5
5          F      6

CodePudding user response：

Update

To select randomly rows, use np.random.choice:

import numpy as np

idx = np.random.choice(df.index, len(df) // 2, replace=False)
df.loc[idx, ['letter', 'number']] = df.loc[idx, ['number', 'letter']].to_numpy()
print(df)

# Output
  number letter
0      1      A
1      2      B
2      3      C
3      D      4
4      E      5
5      F      6

Old answer

You can try:

df.loc[df.index % 2 == 1, ['letter', 'number']] = \
    df.loc[df.index % 2 == 1, ['number', 'letter']].to_numpy()
print(df)

# Output
  number letter
0      1      A
1      B      2
2      3      C
3      D      4
4      5      E
5      F      6

For more readability, use an intermediate variable as a boolean mask:

mask = df.index % 2 == 1
df.loc[mask, ['letter', 'number']] = df.loc[mask, ['number', 'letter']].to_numpy()

CodePudding user response：

You can create a copy of your original data, sample it, and then update it inplace- converting to a numpy ndarray to prevent index-alignment from occuring.

swapped_df = df.copy()
sample = swapped_df.sample(frac=0.5, random_state=0)
swapped_df.loc[sample.index, ['number', 'letter']] = sample[['letter', 'number']].to_numpy()

print(swapped_df)
      number letter
index              
0          1      A
1          B      2
2          C      3
3          4      D
4          E      5
5          6      F
>>>

CodePudding user response：

Similar to previous answers but slightly more readable (in my opinion) if you are trying to build your sense for basic pandas operations:

rows_to_change = df.sample(frac=0.5)
rows_to_change = rows_to_change.rename(columns={'number':'letter', 'letter':'number'})
df.loc[rows_to_change.index] = rows_to_change