Home > Back-end >  Python Panda problems with group by and regularexpression
Python Panda problems with group by and regularexpression

Time:12-05

A Table Sample like bellows-

Product                       Price

P1,Luxary product              2000  

P2: Cosmetics product          1700

P1::Plastic product            600

P3/P1,Mobile phone             3300

P2:headphones                  200

P3,Trimmer                     150

P2,Camera                      2200

P2/Airpods                     250

P3;;phone case                 100

P2/P1:Mirrors                  800

Water Bottel P2 2011           60

From Product column, how can i extract hidden sings (- P1, P2 and P3) sometimes there is more than one signs it is okay if just extract the first sign. Then Group by them(Signs) with Price colums and Print from high price to low price ?

Output:

P2  - 5210
P3  - 3550
P1  - 2600

CodePudding user response:

Assuming that your "hidden signs" always contain two characters, simply create new columns containing these prefixes:

df['Prefix'] = df['Product'].str[:2]

You may then group by prefix:

df.groupby('Prefix').sum()

CodePudding user response:

Here is a proposition using pandas.Series.str.extract :

out = (
        df
          .assign(Signs= df["Product"].str.extract("(P\d)", expand=False))
          .groupby("Signs", as_index=False)["Price"].sum()
          .sort_values(by="Price", ascending=False, ignore_index=True)
       )

# Output :

print(out)

  Signs  Price
0    P2   5210
1    P3   3550
2    P1   2600
  • Related