I have two DataFrames of 20 rows and 4 columns. The names and value types of the columns are the same.
One of the columns is the title
, the other 3 are values.
df1
title col1 col2 col3
apple a d g
pear b e h
grape c f i
df2
title col1 col2 col3
carrot q t w
pumpkin r u x
sprouts s v y
Now I would like to create 3 separate tables/lists subtracting each value of df1.col1 - df2.col1
| df1.col2 - df2.col2
| df1.col3 - df2.col3
. For df1.col1 - df2.col1
I expect an output that looks something among the lines of:
df1.title df2.title score
apple carrot (a - q)
apple pumpkin (a - r)
apple sprouts (a - s)
pear carrot (b - t)
pear pumpkin (b - u)
pear sprouts (b - v)
grape carrot (c - w)
grape pumpkin (c - x)
grape sprouts (c - y)
I tried to create a for loop using the following code:
for i in df1.iterrows():
score_col1 = df1.col1[[i]] - df2.col2[[j]]
score_col2 = df1.col2[[i]] - df2.col2[[j]]
score_col3 = df1.col3[[i]] - df2.col3[[j]]
score_total = score_col1 score_col2 score_col3
i = i 1
In return, I received an output for score_col1
looking like this:
df1.title df2.title score
apple carrot (a - q)
pear carrot (b - t)
grape carrot (c - w)
Can someone help me to obtain the expected output?
CodePudding user response:
Since you want 3 separate DataFrames, we could use a loop (if you want a single DataFrame, we could do a similar job a bit differently).
We could unstack
df2
and iteratively subtract it from the repeat
ed columns of df1
:
out = []
df2_stacked = df2.set_index('title').unstack().droplevel(0).reset_index(name='score')
for col in df1.filter(like='col'):
tmp = (df1[['title', col]]
.loc[df1.index.repeat(len(df2))]
.reset_index(drop=True)
.join(df2_stacked, lsuffix='_df1', rsuffix='_df2'))
tmp['score'] = tmp[col] - tmp['score']
out.append(tmp.drop(columns=col))
Let's test it on a numerical example:
df1
:
title col1 col2 col3
0 apple 1000 100 10
1 pear 2000 200 20
2 grape 3000 300 30
df2
:
title col1 col2 col3
0 carrot 1 4 7
1 pumpkin 2 5 8
2 sprouts 3 6 9
Then if run the code above and print out
, it contains the following three DataFrames:
title_df1 title_df2 score
0 apple carrot 999
1 apple pumpkin 998
2 apple sprouts 997
3 pear carrot 1996
4 pear pumpkin 1995
5 pear sprouts 1994
6 grape carrot 2993
7 grape pumpkin 2992
8 grape sprouts 2991
title_df1 title_df2 score
0 apple carrot 99
1 apple pumpkin 98
2 apple sprouts 97
3 pear carrot 196
4 pear pumpkin 195
5 pear sprouts 194
6 grape carrot 293
7 grape pumpkin 292
8 grape sprouts 291
title_df1 title_df2 score
0 apple carrot 9
1 apple pumpkin 8
2 apple sprouts 7
3 pear carrot 16
4 pear pumpkin 15
5 pear sprouts 14
6 grape carrot 23
7 grape pumpkin 22
8 grape sprouts 21
CodePudding user response:
a1 = ['apple','pear', 'banana']
b1 = [56,32,23]
c1 = [12,34,90]
d1 = [87,65,23]
a2 = ['carrot','pumpkin','sprouts']
b2 = [16,12,93]
c2 = [12,32,70]
d2 = [81,55,21]
df1 = pd.DataFrame({'title':a1, 'col1':b1, 'col2':c1, 'col3':d1})
df2 = pd.DataFrame({'title':a2, 'col1':b2, 'col2':c2, 'col3':d2})
res_df = pd.DataFrame([])
cols = ['col1','col2','col3']
for c in cols:
res_df = pd.DataFrame([])
for i,j in df1.iterrows():
for k,l in df2.iterrows():
res_df = res_df.append(pd.DataFrame({'title_df1':j.title, 'title_df2':l.title, 'score':j[str(c)] - l[str(c)]},index=[0]), ignore_index=True)
print(res_df)