my starting point looks like this
import pandas as pd
dx = {
'bezeichnung': ['Erlöse' , 'Aufwand' , 'Material_19' , 'Gewinn'] ,
'2020': ['1001' , '900' , '90' , '11']
}
dx = {
'bezeichnung': ['Aufwand' , 'Erlöse' , 'Material_16' , 'Gewinn'] ,
'2019': ['1900' , '2001' , '80' , '21']
}
df1 = pd.DataFrame(dx)
df2 = pd.DataFrame(dy)
I want basically the following:
- Compare the columns named 'bezeichnung'. If the elements in both columns are equal to another, add the respective value of '2019' in a new column '2019' which should be either added to df1 or a new df3.
- If an element of 'bezeichnung' in df2 is not found in df1 add the element at the end of column 'bezeichnung' in df1 and put the corresponding value in df2 '2019' to the added column '2019' (see above).
- It is important that the order within column 'bezeichnung' in df1 is maintained.
The result should look like this:
df1 = pd.DataFrame('bezeichnung': ['Erlöse' , 'Aufwand' , 'Material_19' , 'Gewinn', 'Material_16'] ,
'2020': ['1001' , '900' , '90' , '11', '0'] ,
'2019': ['2001' , '1900' , '0' , '21', '80'])
Thank you very much!
CodePudding user response:
You can use outer merge
which combines all the rows for left and right dataframes with NaN
when there are no matched values in the rows. In your case, I change that to string '0'
. Change indicator=True
to see the difference between the two dataframes in an additional column names _merge
import pandas as pd
import numpy as np
dx = {
'bezeichnung': ['Erlöse' , 'Aufwand' , 'Material_19' , 'Gewinn'] ,
'2020': ['1001' , '900' , '90' , '11']
}
dy = {
'bezeichnung': ['Aufwand' , 'Erlöse' , 'Material_16' , 'Gewinn'] ,
'2019': ['1900' , '2001' , '80' , '21']
}
# your expected results
df = pd.DataFrame({'bezeichnung': ['Erlöse' , 'Aufwand' , 'Material_19' , 'Gewinn', 'Material_16'] ,
'2020': ['1001' , '900' , '90' , '11', '0'] ,
'2019': ['2001' , '1900' , '0' , '21', '80']})
df1 = pd.DataFrame(dx)
df2 = pd.DataFrame(dy)
col = 'bezeichnung'
temp = pd.merge(df1, df2, on=col, how='outer', indicator=False)
temp = temp.replace(np.nan, '0')
# you can check if two dfs are equal
if temp.equals(df):
print('Two dfs are equal.\n')
print(temp)
output:
Two dfs are equal.
bezeichnung 2020 2019
0 Erlöse 1001 2001
1 Aufwand 900 1900
2 Material_19 90 0
3 Gewinn 11 21
4 Material_16 0 80