Home > Blockchain >  python pandas how to organize similar group data
python pandas how to organize similar group data

Time:05-04

I want to organize similar group data. Here is my data frame

  SKU
FATUT
GUYGE
FATUT-01
SUPAU
GUYPE
SUPAU-01 
FATUT-02
GUYGE-01 

my expected dataframe will be look like this:

     SKU
   FATUT
FATUT-01
FATUT-02
   GUYGE
GUYGE-01
   SUPAU
SUPAU-01
   GUYPE

I want to organize similar group of data sequentially.

CodePudding user response:

One option is to use groupby with the parameter sort=False; then concatenate the split DataFrames.

How it works:

  • Group df by the strings before the dash
  • groupby sorts by the groupby keys by default; when we specify sort=False, we make sure that the keys are stored in the same order as they first appear in df, i.e. "GUYPE" stays behind "SUPAU".
  • groupby object contains information about the groups that you can unpack like a dictionary. Then unpack it and build a generator expression that returns the grouped DataFrames.
  • Using concat, concatenate the split DataFrames into one; by using ignore_index=True, we ignore index coming from the split DataFrames and reset the index.
out = pd.concat((d for _, d in df.groupby(df['SKU'].str.split('-').str[0], sort=False)), ignore_index=True)

Output:

        SKU
0     FATUT
1  FATUT-01
2  FATUT-02
3     GUYGE
4  GUYGE-01
5     SUPAU
6  SUPAU-01
7     GUYPE

But I feel like, for your task, sort_values might work as well, even if the orders are not exactly the same as in the desired output:

df = df.sort_values(by='SKU', ignore_index=True)

Output:

        SKU
0     FATUT
1  FATUT-01
2  FATUT-02
3     GUYGE
4  GUYGE-01
5     GUYPE
6     SUPAU
7  SUPAU-01
  • Related