Say I have 2 df's
One simply just contains index and year such as:
(index) | Year |
---|---|
1 | 2000 |
2 | 2001 |
3 | 2002 |
4 | 2003 |
Then I have a dataframe that consist of index, year, and some other datapoint such as:
(index) | Year | data |
---|---|---|
1 | 2001 | 1.515 |
2 | 2003 | 2.631 |
How do I join them so that I only transfer over the relevant 'data' column and it properly aligns with the dates 2001 and 2003 in the 1st dataframe? Of-course I will be using this method to import many more columns. e.g:
(index) | Year | data | different data |
---|---|---|---|
1 | 2000 | potato | |
2 | 2001 | 1.515 | |
3 | 2002 | pickle | |
4 | 2003 | 2.631 |
CodePudding user response:
Possible solution is the following:
import pandas as pd
data1 = {"Year": [2000, 2001, 2002, 2003]}
data2 = {"Year": [2001, 2003], "data": [1.515, 2.631]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df = pd.merge(df1, df2, how="outer", on="Year")
df = df.fillna("")
df
Returns
CodePudding user response:
Do a left merge:
>>> df = df.merge(df2, how='left')
>>> df
Year data
0 2000 NaN
1 2001 1.515
2 2002 NaN
3 2003 2.631
# Optional:
>>> df = df.fillna('')
Year data
0 2000
1 2001 1.515
2 2002
3 2003 2.631