I want to organize similar group data. Here is my data frame
SKU
FATUT
GUYGE
FATUT-01
SUPAU
GUYPE
SUPAU-01
FATUT-02
GUYGE-01
my expected dataframe will be look like this:
SKU
FATUT
FATUT-01
FATUT-02
GUYGE
GUYGE-01
SUPAU
SUPAU-01
GUYPE
I want to organize similar group of data sequentially.
CodePudding user response:
One option is to use groupby
with the parameter sort=False
; then concatenate the split DataFrames.
How it works:
- Group
df
by the strings before the dash groupby
sorts by the groupby keys by default; when we specifysort=False
, we make sure that the keys are stored in the same order as they first appear indf
, i.e. "GUYPE" stays behind "SUPAU".groupby
object contains information about the groups that you can unpack like a dictionary. Then unpack it and build a generator expression that returns the grouped DataFrames.- Using
concat
, concatenate the split DataFrames into one; by usingignore_index=True
, we ignore index coming from the split DataFrames and reset the index.
out = pd.concat((d for _, d in df.groupby(df['SKU'].str.split('-').str[0], sort=False)), ignore_index=True)
Output:
SKU
0 FATUT
1 FATUT-01
2 FATUT-02
3 GUYGE
4 GUYGE-01
5 SUPAU
6 SUPAU-01
7 GUYPE
But I feel like, for your task, sort_values
might work as well, even if the orders are not exactly the same as in the desired output:
df = df.sort_values(by='SKU', ignore_index=True)
Output:
SKU
0 FATUT
1 FATUT-01
2 FATUT-02
3 GUYGE
4 GUYGE-01
5 GUYPE
6 SUPAU
7 SUPAU-01