Home > database >  Duplicate Pandas dataframe rows by swaping two columns
Duplicate Pandas dataframe rows by swaping two columns

Time:09-02

given a pandas dataframe that should be modified such that each row is duplicated and the duplicate row differs from the original row in having two columns of them swapped. Furthermore the incremental nature of unique ID must be kept.

Original dataframe:

nme = ["Tom", "John", "Philip", "Ross"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr1 = [90, 40, 80, 98]
scr2 = [10, 20, 30, 50]

# dictionary of lists
dict = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}

my_dataframe = pd.DataFrame(dict)

print(my_dataframe)

Expected dataframe:

nme = ["Tom","Tom", "John", "John", "Philip","Philip", "Ross", "Ross"]
deg = ["MBA", "MBA", "BCA", "BCA", "M.Tech", "M.Tech", "MBA", "MBA"]
scr1 = [90, 10, 40, 20, 80, 30, 98, 50]
scr2 = [10, 90, 20, 40, 30, 80, 50, 98]

# dictionary of lists
dict = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}

my_new_dataframe = pd.DataFrame(dict)

print(my_new_dataframe)

Thank you very much in advance.

CodePudding user response:

You can use itertools.permutations and df.explode to get your desired output.

from itertools import permutations

my_dataframe['combine'] = [list(permutations((i,j),2)) for i,j in zip(my_dataframe.score1,my_dataframe.score2)]

     name  degree  score1  score2               combine
0     Tom     MBA      90      10  [(90, 10), (10, 90)]
1    John     BCA      40      20  [(40, 20), (20, 40)]
2  Philip  M.Tech      80      30  [(80, 30), (30, 80)]
3    Ross     MBA      98      50  [(98, 50), (50, 98)]

my_new_dataframe = my_dataframe.explode('combine') 

     name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
0     Tom     MBA      90      10  (10, 90)
1    John     BCA      40      20  (40, 20)
1    John     BCA      40      20  (20, 40)
2  Philip  M.Tech      80      30  (80, 30)
2  Philip  M.Tech      80      30  (30, 80)
3    Ross     MBA      98      50  (98, 50)
3    Ross     MBA      98      50  (50, 98)

my_new_dataframe[['score1','score2']] = pd.DataFrame(my_new_dataframe['combine'].tolist(), index= my_new_dataframe.index)

     name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
0     Tom     MBA      10      90  (10, 90)
1    John     BCA      40      20  (40, 20)
1    John     BCA      20      40  (20, 40)
2  Philip  M.Tech      80      30  (80, 30)
2  Philip  M.Tech      30      80  (30, 80)
3    Ross     MBA      98      50  (98, 50)
3    Ross     MBA      50      98  (50, 98)

Update

To obtain unique index for every row, you can use df.reset_index

my_new_dataframe.reset_index(drop=True, inplace=True)

     name  degree  score1  score2   combine
0     Tom     MBA      90      10  (90, 10)
1     Tom     MBA      90      10  (10, 90)
2    John     BCA      10      90  (40, 20)
3    John     BCA      10      90  (20, 40)
4  Philip  M.Tech      40      20  (80, 30)
5  Philip  M.Tech      40      20  (30, 80)
6    Ross     MBA      20      40  (98, 50)
7    Ross     MBA      20      40  (50, 98)

CodePudding user response:

This one works for me:

new_df = pd.DataFrame(columns=['name', 'degree', 'score1', 'score2'])
j=0
for index, row in df.iterrows():
    new_df.loc[j] = [row[0], row[1], row[2], row[3]]
    new_df.loc[j 1] = [row[0], row[1], row[3], row[2]]
    j =2
new_df

The Output:

name     degree score1  score2
0   Tom     MBA     90  10
1   Tom     MBA     10  90
2   John    BCA     40  20
3   John    BCA     20  40
4   Philip  M.Tech  80  30
5   Philip  M.Tech  30  80
6   Ross    MBA     98  50
7   Ross    MBA     50  98
  • Related