I have thousands of rows of data in a dataframe as below: input table
I want to be able to extract only all the rows of data that exist from each start ("Type" column) of platform ("Frame" column) to each end ("Type" column) of platform ("Frame" column) as output below and name the data in class column as P1 (All the rows within the first platform start and platform end), P2 (second platform start and end), P3, P4, etc:
CodePudding user response:
Welcome to Stack Overflow!
Here's how I would do it. It probably isn't the cleanest solution, but I think it does what you want to do.
# Set up reproductible example (reprex)
import pandas as pd
df = pd.DataFrame({
"Frame": ["liner", "platform", "liner", "liner", "platform",
"liner", "platform", "liner", "platform", "liner"],
"Type": ["group", "start", "single", "single", "end",
"single", "start", "group", "end", "single"]
})
# Frame Type
# 0 liner group
# 1 plateform start
# 2 liner single
# 3 liner single
# 4 plateform end
# 5 liner single
# 6 plateform start
# 7 liner group
# 8 plateform end
# 9 liner single
Step 1: select rows from start to end of platform
start_indices = df.index[(df.Frame == "platform") & (df.Type == "start")]
end_indices = df.index[(df.Frame == "platform") & (df.Type == "end")]
df = pd.concat([
df[start:end 1] for start, end in zip(start_indices, end_indices)
])
Step 2: add column with platform number
df["Class"] = (
pd.Series(
["P" str(n) for n in range(1, len(start_indices) 1)],
index=start_indices
)
.reindex(df.index)
.fillna(method="ffill")
)
And here's what you get:
df
# Frame Type Class
# 1 platform start P1
# 2 liner single P1
# 3 liner single P1
# 4 platform end P1
# 6 platform start P2
# 7 liner group P2
# 8 platform end P2