I have 2 dataframes looking like this :
In each dataframe there is pattern of 1-2 in the values column. (the values are not significant to my problem, just to demonstrate the pattern)
df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}
I need to swap rows between the two dataframes so that the outcome would be :
df_expected1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 21, 22, 21, 22, 23, 22, 20, 21,23]}
df_expected2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
CodePudding user response:
Knowing that swap-needed indices are from 3n-2
, you can make mask then use numpy.where
:
m = df1["idx"].add(2).mod(3).eq(0)
s1 = np.where(m, df1["values"], df2["values"])
s2 = np.where(~m, df1["values"], df2["values"])
df1["values"] = s1
df2["values"] = s2
Output:
idx values
0 1 20
1 2 21
2 3 22
3 4 21
4 5 22
5 6 23
6 7 22
7 8 20
8 9 21
9 10 23
idx values
0 1 1000
1 2 1000
2 3 10001
3 4 1000
4 5 1000
5 6 1002
6 7 1000
7 8 1003
8 9 1007
9 10 1000
CodePudding user response:
This should do it, knowing that the indices are the same between two dataframes :
df1[df1['idx']%3 == 1], df2[df1['idx']%3 == 1] = df2[df1['idx']%3 == 1], df1[df1['idx']%3 == 1]
Output :
idx values
0 1 1000
1 2 1000
2 3 10001
3 4 1000
4 5 1000
5 6 1002
6 7 1000
7 8 1003
8 9 1007
9 10 1000
idx values
0 1 20
1 2 21
2 3 22
3 4 21
4 5 22
5 6 23
6 7 22
7 8 20
8 9 21
9 10 23
CodePudding user response:
In the code snippet below, I assumed that indexes in both df1
and df2
were equal and that values from df1
are always supposed to be greater than df2
.
import pandas as pd
import numpy as np
from pprint import pprint
df1 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 1000, 10001, 21, 1000, 1002, 22, 1003, 1007,23]}
df2 = {'idx': [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 21, 22, 1000, 22, 23, 1000, 20, 21, 1000]}
a = pd.DataFrame(df1).set_index('idx')
b = pd.DataFrame(df2).set_index('idx')
col_name = 'values'
a_series = a[col_name]
b_series = b[col_name]
for i in a_series.index:
if a_series.loc[i] > b_series.loc[i]:
a_series.loc[i], b_series.loc[i] = b_series.loc[i], a_series.loc[i]
df_expected1 = {'idx': a_series.index.tolist(), 'values': a_series.values.tolist()}
df_expected2 = {'idx': b_series.index.tolist(), 'values': b_series.values.tolist()}
pprint(df_expected1)
pprint(df_expected2)
Output:
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [1000, 1000, 10001, 1000, 1000, 1002, 1000, 1003, 1007, 1000]}
{'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'values': [20, 21, 22, 21, 22, 23, 22, 20, 21, 23]}