Assign brackets to values in pandas dataframe-CodePudding

I have a dataframe that looks like this:

dict = {'industryId': {0: '1730B' , 1: '1730B', 2: '1730B', 3: '1730B', 4: '3524A', 5: '3524A', 6: '3524A', 7: '3524A'},
 'year': {0: 2017, 1: 2018, 2: 2019, 3: 2020, 4: 2017, 5: 2018, 6: 2019, 7: 2020},
 'value': {0: 500, 1: 512, 2: 370, 3: 490, 4: 600, 5: 610, 6: 630, 7: 290}}

df = pd.DataFrame(dict)

    industryId  year    value
0   1730B       2017    500
1   1730B       2018    512
2   1730B       2019    370
3   1730B       2020    490
4   3524A       2017    600
5   3524A       2018    610
6   3524A       2019    630
7   3524A       2020    290

I want to use the quantile function to assign a bracket to each value across each year, by comparing the industryId values against each other.

The output df should look like this:

    industryId  year    value  bracket
0   1730B       2017    500      0
1   3524A       2017    600      1
2   1730B       2018    512      0
3   3524A       2018    610      1
4   1730B       2019    370      0
5   3524A       2019    630      1
6   3524A       2020    290      0
7   1730B       2020    490      1

My code looks like this:

df_cutoffvals = df.groupby(['year','industryId'])['value'] \
                         .quantile(q=[i for i in range(0,2)]) \
                         .reset_index()

However, it duplicates all the values and I don't know how to correct this. Here is the output of my code:

    year    industryId  level_2 value
0   2017    1730B          0    500.00
1   2017    1730B          1    500.00
2   2017    3524A          0    600.00
3   2017    3524A          1    600.00
4   2018    1730B          0    512.00
5   2018    1730B          1    512.00
6   2018    3524A          0    610.00
7   2018    3524A          1    610.00
8   2019    1730B          0    370.00
9   2019    1730B          1    370.00
10  2019    3524A          0    630.00
11  2019    3524A          1    630.00
12  2020    1730B          0    490.00
13  2020    1730B          1    490.00
14  2020    3524A          0    290.00
15  2020    3524A          1    290.00

Does anyone have any suggestion how to get from this to my desired output?

CodePudding user response：

In your code you groupby using the index as both 'year' and 'industryId', so there is a duplicate instance of every pair. You might only need to groupby using 'year' only.

import pandas as pd 

dict = {'industryId': {0: '1730B' , 1: '1730B', 2: '1730B', 3: '1730B', 4: '3524A', 5: '3524A', 6: '3524A', 7: '3524A'},
 'year': {0: 2017, 1: 2018, 2: 2019, 3: 2020, 4: 2017, 5: 2018, 6: 2019, 7: 2020},
 'value': {0: 500, 1: 512, 2: 370, 3: 490, 4: 600, 5: 610, 6: 630, 7: 290}}
df = pd.DataFrame(dict)

df_cutoffvals = df.groupby(['year'])[['value']].quantile(q=[i for i in range(0,2)]).reset_index()
df_final = pd.merge(df_cutoffvals, df, how='inner', on=['year', 'value'])
print(df_final)

Output

   year  level_1  value industryId
0  2017      0.0  500.0      1730B
1  2017      1.0  600.0      3524A
2  2018      0.0  512.0      1730B
3  2018      1.0  610.0      3524A
4  2019      0.0  370.0      1730B
5  2019      1.0  630.0      3524A
6  2020      0.0  290.0      3524A
7  2020      1.0  490.0      1730B