given a pandas dataframe that should be modified such that each row is duplicated and the duplicate row differs from the original row in having two columns of them swapped. Furthermore the incremental nature of unique ID must be kept.
Original dataframe:
nme = ["Tom", "John", "Philip", "Ross"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr1 = [90, 40, 80, 98]
scr2 = [10, 20, 30, 50]
# dictionary of lists
dict = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_dataframe = pd.DataFrame(dict)
print(my_dataframe)
Expected dataframe:
nme = ["Tom","Tom", "John", "John", "Philip","Philip", "Ross", "Ross"]
deg = ["MBA", "MBA", "BCA", "BCA", "M.Tech", "M.Tech", "MBA", "MBA"]
scr1 = [90, 10, 40, 20, 80, 30, 98, 50]
scr2 = [10, 90, 20, 40, 30, 80, 50, 98]
# dictionary of lists
dict = {'name': nme, 'degree': deg, 'score1': scr1, 'score2': scr2}
my_new_dataframe = pd.DataFrame(dict)
print(my_new_dataframe)
Thank you very much in advance.
CodePudding user response:
You can use itertools.permutations
and df.explode
to get your desired output.
from itertools import permutations
my_dataframe['combine'] = [list(permutations((i,j),2)) for i,j in zip(my_dataframe.score1,my_dataframe.score2)]
name degree score1 score2 combine
0 Tom MBA 90 10 [(90, 10), (10, 90)]
1 John BCA 40 20 [(40, 20), (20, 40)]
2 Philip M.Tech 80 30 [(80, 30), (30, 80)]
3 Ross MBA 98 50 [(98, 50), (50, 98)]
my_new_dataframe = my_dataframe.explode('combine')
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
0 Tom MBA 90 10 (10, 90)
1 John BCA 40 20 (40, 20)
1 John BCA 40 20 (20, 40)
2 Philip M.Tech 80 30 (80, 30)
2 Philip M.Tech 80 30 (30, 80)
3 Ross MBA 98 50 (98, 50)
3 Ross MBA 98 50 (50, 98)
my_new_dataframe[['score1','score2']] = pd.DataFrame(my_new_dataframe['combine'].tolist(), index= my_new_dataframe.index)
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
0 Tom MBA 10 90 (10, 90)
1 John BCA 40 20 (40, 20)
1 John BCA 20 40 (20, 40)
2 Philip M.Tech 80 30 (80, 30)
2 Philip M.Tech 30 80 (30, 80)
3 Ross MBA 98 50 (98, 50)
3 Ross MBA 50 98 (50, 98)
Update
To obtain unique index for every row, you can use df.reset_index
my_new_dataframe.reset_index(drop=True, inplace=True)
name degree score1 score2 combine
0 Tom MBA 90 10 (90, 10)
1 Tom MBA 90 10 (10, 90)
2 John BCA 10 90 (40, 20)
3 John BCA 10 90 (20, 40)
4 Philip M.Tech 40 20 (80, 30)
5 Philip M.Tech 40 20 (30, 80)
6 Ross MBA 20 40 (98, 50)
7 Ross MBA 20 40 (50, 98)
CodePudding user response:
This one works for me:
new_df = pd.DataFrame(columns=['name', 'degree', 'score1', 'score2'])
j=0
for index, row in df.iterrows():
new_df.loc[j] = [row[0], row[1], row[2], row[3]]
new_df.loc[j 1] = [row[0], row[1], row[3], row[2]]
j =2
new_df
The Output:
name degree score1 score2
0 Tom MBA 90 10
1 Tom MBA 10 90
2 John BCA 40 20
3 John BCA 20 40
4 Philip M.Tech 80 30
5 Philip M.Tech 30 80
6 Ross MBA 98 50
7 Ross MBA 50 98