I have a dataset of 50000 farmers who are growing crop in some villages. I Have to figure out how many farmers in same survey number land and how much his crop area [output image attached]
Here is my dummy data set
df
Out[5]:
Name Village Survey_no Land_Area
0 Farmer_1 Village_1 26 0.33
1 Farmer_1 Village_1 26 0.40
2 Farmer_2 Village_1 26 0.30
3 Farmer_2 Village_1 26 0.40
4 Farmer_2 Village_1 26 0.50
5 Farmer_3 Village_1 26 0.52
6 Farmer_3 Village_1 26 0.40
7 Farmer_4 Village_1 151 0.23
8 Farmer_5 Village_1 151 0.25
9 Farmer_5 Village_1 151 0.10
Here is actual output required
Here is what I have so far:
df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
df
Village Survey_no Land_Area-1 ... Name-6 Land_Area-7 Name-7
0 Village_1 26 0.33 ... Farmer_3 0.4 Farmer_3
1 Village_1 151 0.23 ... NaN NaN NaN
The output is not correct, because I don't get actual farmers wise total area of the same land and number of farmers in the same land.
CodePudding user response:
Update:
dfs= df.groupby(['Name', 'Village', 'Survey_no']).agg('sum')
dfs = dfs.reset_index(level=0).set_index([dfs.groupby(['Village', 'Survey_no']).cumcount() 1], append=True)\
.unstack().sort_index(level=1, axis=1)
dfs.columns = [f'{i}_{j}' for i, j in dfs.columns]
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs
Output:
Land_Area_1 Name_1 Land_Area_2 Name_2 Land_Area_3 Name_3 Total_Land_Area
Village Survey_no
Village_1 26 0.73 Farmer_1 1.20 Farmer_2 0.92 Farmer_3 2.85
151 0.23 Farmer_4 0.35 Farmer_5 NaN NaN 0.58
Try this:
cnt = df.groupby(['Village', 'Survey_no']).cumcount() 1
dfs= df.groupby(['Village', 'Survey_no', cnt]).agg({'Name':'first',
'Land_Area':'sum'})\
.unstack()\
.sort_index(level=1, axis=1)
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs.columns = [f'{i}_{j}' if j else f'{i}' for i, j in dfs.columns]
dfs
Output:
Land_Area_1 Name_1 ... Name_7 Total_Land_Area
Village Survey_no ...
Village_1 26 0.33 Farmer_1 ... Farmer_3 2.85
151 0.23 Farmer_4 ... NaN 0.58
[2 rows x 15 columns]
CodePudding user response:
Experience and strength was enough only for this. How to join bbb to aaa, overcomplicated solutions come to mind. Which I don't like.
bbb = df.groupby(['Name'])['Land_Area'].aggregate(['sum'])
aaa = df.groupby(['Village', 'Survey_no']).aggregate({'Land_Area': 'sum', 'Name': 'nunique'}).reset_index()
aaa = aaa.rename(columns={"Name": "No.of Farmers"})
Output bbb
sum
Name
Farmer_1 0.73
Farmer_2 1.20
Farmer_3 0.92
Farmer_4 0.23
Farmer_5 0.35
Output aaa
Village Survey_no Land_Area No.of Farmers
0 Village_1 26 2.85 3
1 Village_1 151 0.58 2