Home > database >  How to only display the "subcontrol"
How to only display the "subcontrol"

Time:08-31

I have the following dataframe: (containing information like the one below)

import pandas as pd

data = {
    "items": ["4.2 Paint", "4.2.1 Paint job", "4.2.1.10 Paint red", "3.2 Seats", "3.2.3.8 Seat belt"]
}

df = pd.DataFrame(data)

print(df) 
                items
0           4.2 Paint
1     4.2.1 Paint job
2  4.2.1.10 Paint red
3           3.2 Seats
4   3.2.3.8 Seat belt

How can I display just the following?

                items
0  4.2.1.10 Paint red
1   3.2.3.8 Seat belt

CodePudding user response:

It's very hard to workout what the criteria is here but if it's looking for the 4th subgroups then filter for when there are 3 dots.

df[df['items'].apply(lambda x: x.count(".")==3)]

-=-=-EDIT-=-==-

If want the max per subgroup then something like this would work.

  • get the group number
  • count the . per line
  • within each group select the max
df['group'] = df['items'].apply(lambda x: pd.to_numeric(x.split('.', 1)[0]))
df['level'] = df['items'].apply(lambda x: x.count("."))

df.groupby('group').apply(lambda x: x.loc[x['level'] == x['level'].max()])

CodePudding user response:

Here is another way to do it using regex

df[df['items'].str.match(r'^((\d [\.|\s]){4})\D')]

^ : from begining
(\d [.|\s]) : capture group of digit(s) followed by a period or space
{4} : occurrences of the previous captured group
\D : followed by a non-digit-

items
2   4.2.1.10 Paint red
4   3.2.3.8 Seat belt
  • Related