Home > OS >  Pandas group by one column and fill up another column
Pandas group by one column and fill up another column

Time:07-15

I have the following dataframe with two columns:

data = [['A', '3ykf'], ['A', '3ykf'], ['A', ], ['B', ], ['B', '6jbk'], ['B', ], ['B', ], ['C', ], ['C', ]]
df = pd.DataFrame(data, columns=['column1', 'column2'])

column1 | column2
   A       "3ykf"
   A          
   A       "3ykf"
   B
   B
   B       "6jbk"
   B
   C
   C

I want to "fill up" the second column like this:

column1 | column2
   A       "3ykf"
   A       "3ykf"
   A       "3ykf"
   B       "6jbk"
   B       "6jbk"
   B       "6jbk"
   B       "6jbk"
   C
   C

Column1 is the column I want to group by, and within each group, column2 contains either a string that doesn't change within one group or it's empty.

I want to fill up the empty cells in column2 by adding the same string to each cell within one group.

The issue is that the string is not necessarily the most common value within one group, as the most common value might be an empty cell (like for group B).

Obviously, for group C, column2 can't be filled up with anything, so it should be kept empty in the resulting dataframe.

CodePudding user response:

You can fill with the first avaiable value:

df.column2 = df.groupby('column1').column2.transform('first')

Result:

  column1 column2
0       A    3ykf
1       A    3ykf
2       A    3ykf
3       B    6jbk
4       B    6jbk
5       B    6jbk
6       B    6jbk
7       C    None
8       C    None

CodePudding user response:

Although @Stef's solution is probably the best, I would like to provide an additional one that uses pd.merge instead of groupby transform. It is always good to think of several approaches. What it does is to join the unique sorted values of df to the original data frame:

(
  df.assign(column2=pd.merge(
        df, df.sort_values(["column2"]).drop_duplicates(subset=["column1"]),
        on=["column1"],
        how="left")[["column2_x", "column2_y"]]
   .bfill(axis=1).iloc[:,0]
   )
)

Output:

  column1 column2
0       A    3ykf
1       A    3ykf
2       A    3ykf
3       B    6jbk
4       B    6jbk
5       B    6jbk
6       B    6jbk
7       C    None
8       C    None
  • Related