I have a pandas dataframe similar to this:
Fruits Year Farm A Year Farm B
Apples 2021 2000 2022 2200
Apples 2020 1500 2021 2100
Apples 2019 1200 2019 1900
I would like to transform this dataframe into something looking like this:
Fruits Year Farm A Farm B
Apples 2022 - 2200
Apples 2021 2000 2100
Apples 2020 1500 1900
Apples 2019 1200 -
Any help please?
CodePudding user response:
If you had two dataframes:
Fruits Year FarmA
Apples 2021 2000
Apples 2020 1500
Apples 2019 1200
Fruits Year FarmB
Apples 2022 2200
Apples 2021 2100
Apples 2019 1900
say, left
and right
, then you could merge those as follows:
out = pd.merge(left, right, on=["Fruits", "Year"], how="outer")
Which looks like:
Fruits Year FarmA FarmB
0 Apples 2021 2000.0 2100.0
1 Apples 2020 1500.0 NaN
2 Apples 2019 1200.0 1900.0
3 Apples 2022 NaN 2200.0
You sort of have two dataframes already joined on Fruits
, which is awkward to work with. You could get the two separate dataframes like:
left = df[["Fruits", "Year1", "FarmA"]].rename(columns={"Year1": "Year"})
right = df[["Fruits", "Year1", "FarmB"]].rename(columns={"Year2": "Year"})
Note that I changed the names of the columns of your original df a bit. (Duplicate "Year", spaces in "Farm X")
CodePudding user response:
If the data matches the original structure, we could utilize DataFrame.melt()
to unpivot our data. While this does not match the OP final output, this structure can be useful to do other operations and it can be nice note to have NaN
values.
import pandas as pd
df = pd.DataFrame({"Fruits": ["Apple", "Apple", "Apple"],
"Year": [2021, 2020, 2019],
"Farm A": [2000, 1500, 1200],
"Year": [2022, 2021, 2019],
"Farm B": [2200, 2100, 1900]})
df.melt(id_vars = ["Fruits", "Year"], var_name = "Farm", value_name = "Farm Output")
## Fruits Year Farm Farm Output
## 0 Apple 2022 Farm A 2000
## 1 Apple 2021 Farm A 1500
## 2 Apple 2019 Farm A 1200
## 3 Apple 2022 Farm B 2200
## 4 Apple 2021 Farm B 2100
## 5 Apple 2019 Farm B 1900