python pandas how to organize similar group data-CodePudding

I want to organize similar group data. Here is my data frame

  SKU
FATUT
GUYGE
FATUT-01
SUPAU
GUYPE
SUPAU-01 
FATUT-02
GUYGE-01

my expected dataframe will be look like this:

     SKU
   FATUT
FATUT-01
FATUT-02
   GUYGE
GUYGE-01
   SUPAU
SUPAU-01
   GUYPE

I want to organize similar group of data sequentially.

CodePudding user response：

One option is to use groupby with the parameter sort=False; then concatenate the split DataFrames.

How it works:

Group df by the strings before the dash
groupby sorts by the groupby keys by default; when we specify sort=False, we make sure that the keys are stored in the same order as they first appear in df, i.e. "GUYPE" stays behind "SUPAU".
groupby object contains information about the groups that you can unpack like a dictionary. Then unpack it and build a generator expression that returns the grouped DataFrames.
Using concat, concatenate the split DataFrames into one; by using ignore_index=True, we ignore index coming from the split DataFrames and reset the index.

out = pd.concat((d for _, d in df.groupby(df['SKU'].str.split('-').str[0], sort=False)), ignore_index=True)

Output:

        SKU
0     FATUT
1  FATUT-01
2  FATUT-02
3     GUYGE
4  GUYGE-01
5     SUPAU
6  SUPAU-01
7     GUYPE

But I feel like, for your task, sort_values might work as well, even if the orders are not exactly the same as in the desired output:

df = df.sort_values(by='SKU', ignore_index=True)

Output:

        SKU
0     FATUT
1  FATUT-01
2  FATUT-02
3     GUYGE
4  GUYGE-01
5     GUYPE
6     SUPAU
7  SUPAU-01