Home > Mobile >  Using python pandas how to do some following analysis for calculation
Using python pandas how to do some following analysis for calculation

Time:06-01

I have a dataset of 50000 farmers who are growing crop in some villages. I Have to figure out how many farmers in same survey number land and how much his crop area [output image attached]

Here is my dummy data set

df
Out[5]: 
       Name    Village  Survey_no  Land_Area
0  Farmer_1  Village_1         26       0.33
1  Farmer_1  Village_1         26       0.40
2  Farmer_2  Village_1         26       0.30
3  Farmer_2  Village_1         26       0.40
4  Farmer_2  Village_1         26       0.50
5  Farmer_3  Village_1         26       0.52
6  Farmer_3  Village_1         26       0.40
7  Farmer_4  Village_1        151       0.23
8  Farmer_5  Village_1        151       0.25
9  Farmer_5  Village_1        151       0.10

Here is actual output required

enter image description here

Here is what I have so far:

df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]

df = df.reset_index()


df

Village  Survey_no  Land_Area-1    ...       Name-6  Land_Area-7    Name-7
0  Village_1         26         0.33    ...     Farmer_3          0.4  Farmer_3
1  Village_1        151         0.23    ...          NaN          NaN       NaN

The output is not correct, because I don't get actual farmers wise total area of the same land and number of farmers in the same land.

CodePudding user response:

Update:

dfs= df.groupby(['Name', 'Village', 'Survey_no']).agg('sum')
dfs = dfs.reset_index(level=0).set_index([dfs.groupby(['Village', 'Survey_no']).cumcount()   1], append=True)\
         .unstack().sort_index(level=1, axis=1)
dfs.columns = [f'{i}_{j}' for i, j in dfs.columns]
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs

Output:

                     Land_Area_1    Name_1  Land_Area_2    Name_2  Land_Area_3    Name_3  Total_Land_Area
Village   Survey_no                                                                                      
Village_1 26                0.73  Farmer_1         1.20  Farmer_2         0.92  Farmer_3             2.85
          151               0.23  Farmer_4         0.35  Farmer_5          NaN       NaN             0.58

Try this:

cnt = df.groupby(['Village', 'Survey_no']).cumcount() 1
dfs= df.groupby(['Village', 'Survey_no', cnt]).agg({'Name':'first',
                                              'Land_Area':'sum'})\
  .unstack()\
  .sort_index(level=1, axis=1)

dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs.columns = [f'{i}_{j}' if j else f'{i}' for i, j in dfs.columns]
dfs

Output:

                     Land_Area_1    Name_1  ...    Name_7 Total_Land_Area
Village   Survey_no                         ...                          
Village_1 26                0.33  Farmer_1  ...  Farmer_3            2.85
          151               0.23  Farmer_4  ...       NaN            0.58

[2 rows x 15 columns]

CodePudding user response:

Experience and strength was enough only for this. How to join bbb to aaa, overcomplicated solutions come to mind. Which I don't like.

bbb = df.groupby(['Name'])['Land_Area'].aggregate(['sum'])
aaa = df.groupby(['Village', 'Survey_no']).aggregate({'Land_Area': 'sum', 'Name': 'nunique'}).reset_index()
aaa = aaa.rename(columns={"Name": "No.of Farmers"})

Output bbb

           sum
Name          
Farmer_1  0.73
Farmer_2  1.20
Farmer_3  0.92
Farmer_4  0.23
Farmer_5  0.35

Output aaa

     Village  Survey_no  Land_Area  No.of Farmers
0  Village_1         26       2.85              3
1  Village_1        151       0.58              2
  • Related