Home > Mobile >  How to merge two columns in a pandas dataframe, both containing years but different years?
How to merge two columns in a pandas dataframe, both containing years but different years?

Time:01-02

I have a pandas dataframe similar to this:

Fruits  Year    Farm A  Year    Farm B
Apples  2021    2000    2022    2200
Apples  2020    1500    2021    2100
Apples  2019    1200    2019    1900

I would like to transform this dataframe into something looking like this:

Fruits  Year    Farm A  Farm B
Apples  2022      -     2200
Apples  2021    2000    2100
Apples  2020    1500    1900
Apples  2019    1200     -

Any help please?

CodePudding user response:

If you had two dataframes:

Fruits  Year   FarmA
Apples  2021    2000
Apples  2020    1500
Apples  2019    1200
Fruits  Year   FarmB
Apples  2022    2200
Apples  2021    2100
Apples  2019    1900

say, left and right, then you could merge those as follows:

out = pd.merge(left, right, on=["Fruits", "Year"], how="outer")

Which looks like:

   Fruits  Year   FarmA   FarmB
0  Apples  2021  2000.0  2100.0
1  Apples  2020  1500.0     NaN
2  Apples  2019  1200.0  1900.0
3  Apples  2022     NaN  2200.0

You sort of have two dataframes already joined on Fruits, which is awkward to work with. You could get the two separate dataframes like:

left  = df[["Fruits", "Year1", "FarmA"]].rename(columns={"Year1": "Year"})
right = df[["Fruits", "Year1", "FarmB"]].rename(columns={"Year2": "Year"})

Note that I changed the names of the columns of your original df a bit. (Duplicate "Year", spaces in "Farm X")

CodePudding user response:

If the data matches the original structure, we could utilize DataFrame.melt() to unpivot our data. While this does not match the OP final output, this structure can be useful to do other operations and it can be nice note to have NaN values.

import pandas as pd

df = pd.DataFrame({"Fruits": ["Apple", "Apple", "Apple"],
 "Year": [2021, 2020, 2019],
 "Farm A": [2000, 1500, 1200],
 "Year": [2022, 2021, 2019],
 "Farm B": [2200, 2100, 1900]})

df.melt(id_vars = ["Fruits", "Year"], var_name = "Farm", value_name = "Farm Output")

##    Fruits  Year    Farm  Farm Output
##  0  Apple  2022  Farm A         2000
##  1  Apple  2021  Farm A         1500
##  2  Apple  2019  Farm A         1200
##  3  Apple  2022  Farm B         2200
##  4  Apple  2021  Farm B         2100
##  5  Apple  2019  Farm B         1900
  • Related