Home > Software design >  How to split the value of a field in a Python dataframe, add it to a new row, and use the existing v
How to split the value of a field in a Python dataframe, add it to a new row, and use the existing v

Time:05-02

    STUDY              Teacher       UPDATE_DATE  COMPARE_DATE  INTERVAL_DAYS
0   entertainment           C        2022-03-02    2022-01-01      0.61
1   entertainment        B, C        2022-03-02    2022-04-10      0.39
2   math                    A        2022-02-25    2022-01-01      0.56
3   math                 A, C        2022-02-25    2022-04-10      0.44
4   science                 D        2022-01-03    2022-01-01      0.02
5   science                 A        2022-01-03    2022-02-20      0.48
6   science           A, B, E        2022-02-20    2022-04-10      0.49
7   technology              E        2021-09-01    2022-09-01      0.00

Q. when there are more than 2 teacheres, I want to create a row by splitting based on "," and divide the value of the INTERVAL_DAYS field by the number of teachers to add. I thought it had a similar structure to the compare_date function you helped and tried to apply it, but the desired output did not come out, so I had to raise an additional question.

   STUDY              Teacher       UPDATE_DATE  COMPARE_DATE  INTERVAL_DAYS
0   entertainment           C        2022-03-02    2022-01-01      0.61
1   entertainment           B        2022-03-02    2022-04-10      0.2
2   entertainment           C        2022-03-02    2022-04-10      0.2
3   math                    A        2022-02-25    2022-01-01      0.56
4   math                    A        2022-02-25    2022-04-10      0.22
5   math                    C        2022-02-25    2022-04-10      0.22
6   science                 D        2022-01-03    2022-01-01      0.02
7   science                 A        2022-01-03    2022-02-20      0.48
8   science                 A        2022-02-20    2022-04-10      0.16
9   science                 B        2022-02-20    2022-04-10      0.16

10 science E 2022-02-20 2022-04-10 0.16 11 technology E 2021-09-01 2022-09-01 1.00

The final thing you want to get is the sum of interval days for each Teacher.

   Teacher     WORKING_COUNTS
0        A             2
1        B          0.36
2        C          1.03
3        D          0.02
4        E          1.16

CodePudding user response:

First split value by , and use DataFrame.explode, divide rows by counts by GroupBy.transform and last aggrgate sum:

df = df.assign(Teacher = df['Teacher'].str.split(', ')).explode('Teacher')
df['INTERVAL_DAYS'] /= df.groupby(level=0)['INTERVAL_DAYS'].transform('size')
print (df)
           STUDY Teacher UPDATE_DATE COMPARE_DATE  INTERVAL_DAYS
0  entertainment       C  2022-03-02   2022-01-01       0.610000
1  entertainment       B  2022-03-02   2022-04-10       0.195000
1  entertainment       C  2022-03-02   2022-04-10       0.195000
2           math       A  2022-02-25   2022-01-01       0.560000
3           math       A  2022-02-25   2022-04-10       0.220000
3           math       C  2022-02-25   2022-04-10       0.220000
4        science       D  2022-01-03   2022-01-01       0.020000
5        science       A  2022-01-03   2022-02-20       0.480000
6        science       A  2022-02-20   2022-04-10       0.163333
6        science       B  2022-02-20   2022-04-10       0.163333
6        science       E  2022-02-20   2022-04-10       0.163333
7     technology       E  2021-09-01   2022-09-01       0.000000

out = df.groupby('Teacher')['INTERVAL_DAYS'].sum().reset_index(name='WORKING_COUNTS')
print (out)
  Teacher  WORKING_COUNTS
0       A        1.423333
1       B        0.358333
2       C        1.025000
3       D        0.020000
4       E        0.163333
  • Related