Home > OS >  replcae values within a group on pandas dataframe column by previous group value
replcae values within a group on pandas dataframe column by previous group value

Time:09-10

I have a dataframe :

   country group   A   B   C   D
0        1    a1  10  20  30  40
1        1    a1  11  21  31  41
2        1    a1  12  22  32  42
3        2    a2   50  60  70  80
4        2    a2   51  61  71  81
5        2    a2   52  62  72  82
6        2    a2   53  63  73  83
7        2    a2  50  60  70  80
8        3    a3  51  61  71  81
9        3    a3  52  62  72  82
10       3    a3  53  63  73  83
11       3    a3  53  63  73  83

My goal is to have a dataframe as follows :

     country group   A   B   C   D
0        1    NAN  10  20  30  40
1        1    NAN 11  21  31  41
2        1    NAN 12  22  32  42
3        2    a1  50  60  70  80
4        2    a1  51  61  71  81
5        2    a1  52  62  72  82
6        2    a1  53  63  73  83
7        2    a1  50  60  70  80
8        3    a2  51  61  71  81
9        3    a2  52  62  72  82
10       3    a2  53  63  73  83
11       3    a2  53  63  73  83

Where I get the values of the previous group on column group and shift it to next group

CodePudding user response:

You can use a mapping Series:

s = df.set_index('country')['group'].drop_duplicates()

df['group'] = df['country'].map(s.shift())

output:

    country group   A   B   C   D
0         1   NaN  10  20  30  40
1         1   NaN  11  21  31  41
2         1   NaN  12  22  32  42
3         2    a1  50  60  70  80
4         2    a1  51  61  71  81
5         2    a1  52  62  72  82
6         2    a1  53  63  73  83
7         2    a1  50  60  70  80
8         3    a2  51  61  71  81
9         3    a2  52  62  72  82
10        3    a2  53  63  73  83
11        3    a2  53  63  73  83

mapping Series s:

country
1    a1
2    a2
3    a3
Name: group, dtype: object

CodePudding user response:

Use Series.shift values with comapre by origianl column and then forward filling missing values:

s = df['group'].shift()
df['group'] = s.where(s.ne(df['group'])).ffill()
print (df)
    country group   A   B   C   D
0         1   NaN  10  20  30  40
1         1   NaN  11  21  31  41
2         1   NaN  12  22  32  42
3         2    a1  50  60  70  80
4         2    a1  51  61  71  81
5         2    a1  52  62  72  82
6         2    a1  53  63  73  83
7         2    a1  50  60  70  80
8         3    a2  51  61  71  81
9         3    a2  52  62  72  82
10        3    a2  53  63  73  83
11        3    a2  53  63  73  83
  • Related