I have a dataframe of the following structure:
df = pd.DataFrame(
{
"Farm ID": ["1", "2", "2", "3", "3"],
"Crop": ["Type A", "Type A", "Type B", "Type B", "Type B"],
"Plot ID": ["A", "A", "B", "A", "B"],
"Plot area": ["8", "4", "2", "3"],
"More values": ["101", "215", "13", "0.6"],
"More text": ["foo", "bar", "foobar", "barfoo"],
}
)
For each plot I now want to know the ratio, the plot area has on the total farm area in case there are different crops on the plots. I then also want to combine plots with the same crop. So the output should look like this:
Plot ID Plot area area share More values More text
Farm ID Crop
1 Type A A 8 1 101 foo
2 Type A A 4 0.66 215 bar
Type B B 2 0.33 13 foobar
3 Type B AB 35 1 0.642 barfoonothing
I know, that I can use dfNew = df.groupby(["Farm ID", "Crop"]).sum()
to get the correct structure, and FarmArea = df.groupby(level=0).sum()
to get the farm area, but I cannot figure out how to get area share
. Any suggestions?
CodePudding user response:
You can use transform
:
df = pd.DataFrame(
{
"Farm ID": ["1", "2", "2", "3"],
"Crop": ["Type A", "Type A", "Type B", "Type B"],
"Plot ID": ["A", "A", "B", "A"],
"Plot area": [8, 4, 2, 3],
"More values": ["101", "215", "13", "0.6"],
"More text": ["foo", "bar", "foobar", "barfoo"],
}
)
dfNew = df.groupby(["Farm ID", "Crop"]).sum()
dfNew["Area share"]=dfNew["Plot area"] / dfNew.groupby(["Farm ID"])["Plot area"].transform("sum")
dfNew
output:
Plot area Area share
Farm ID Crop
1 Type A 8 1.000000
2 Type A 4 0.666667
Type B 2 0.333333
3 Type B 3 1.000000
Please note that I loaded Plot area as numeric, if that is not the case you may need to convert it before using transform.