How can I get only certain rows of data based on certain criteria-CodePudding

I have thousands of rows of data in a dataframe as below: input table

I want to be able to extract only all the rows of data that exist from each start ("Type" column) of platform ("Frame" column) to each end ("Type" column) of platform ("Frame" column) as output below and name the data in class column as P1 (All the rows within the first platform start and platform end), P2 (second platform start and end), P3, P4, etc:

Output table

CodePudding user response：

Welcome to Stack Overflow!

Here's how I would do it. It probably isn't the cleanest solution, but I think it does what you want to do.

# Set up reproductible example (reprex)
import pandas as pd

df = pd.DataFrame({
    "Frame": ["liner", "platform", "liner", "liner", "platform",
              "liner", "platform", "liner", "platform", "liner"],
    "Type": ["group", "start", "single", "single", "end",
             "single", "start", "group", "end", "single"]
})

#    Frame       Type
# 0  liner       group
# 1  plateform   start
# 2  liner       single
# 3  liner       single
# 4  plateform   end
# 5  liner       single
# 6  plateform   start
# 7  liner       group
# 8  plateform   end
# 9  liner       single

Step 1: select rows from start to end of platform

start_indices = df.index[(df.Frame == "platform") & (df.Type == "start")]
end_indices = df.index[(df.Frame == "platform") & (df.Type == "end")]

df = pd.concat([
    df[start:end 1] for start, end in zip(start_indices, end_indices)
])

Step 2: add column with platform number

df["Class"] = (
    pd.Series(
        ["P"   str(n) for n in  range(1, len(start_indices)   1)],
        index=start_indices
    )
    .reindex(df.index)
    .fillna(method="ffill")
)

And here's what you get:

df

#    Frame      Type    Class
# 1  platform   start   P1
# 2  liner      single  P1
# 3  liner      single  P1
# 4  platform   end     P1
# 6  platform   start   P2
# 7  liner      group   P2
# 8  platform   end     P2