Please find below my input/output :
Input :
Col1
Green
Purple
Start delimiter
abchd
oeitms
End delimiter
Yellow
Start delimiter
kdkfkd
dldldldl
mdmdmdm
End delimiter
Red
Brown
Rose
White
Output (desired) :
Col1
Green
Purple
Yellow
Red
Brown
Rose
White
Basically, I'm trying to drop all the rows between Start delimiter
and End delimiter
(including their rows too).
The problem is the distance between these two is not fix !
I tried the following code but without any success :
m1 = df['Col1'] == 'Start delimiter'
m2 = df['Col1'] == 'End delimiter'
df.loc[~m1&~m2]
Do you have any suggestions, please ?
I really appreciate your help.
CodePudding user response:
For a vectorial approach you can use:
d = {'Start delimiter': False,
'End delimiter': True}
# assign False after Start
# True after End
# True otherwise
m1 = df['Col1'].map(d).ffill().fillna(True)
# ensure not an End
m2 = df['Col1'].ne('End delimiter')
# if both conditions are met, keep the rows
out = df.loc[m1&m2]
Output:
Col1
0 Green
1 Purple
6 Yellow
12 Red
13 Brown
14 Rose
15 White
CodePudding user response:
You could try:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"col1":["red","blue","Start delimiter","black", "white", "End delimiter", "green", "blue",
"Start delimiter", "red", "green", "anothercolor", "End delimiter", "again_anothercolor"]}
)
idx_start = df.index[df.col1 == "Start delimiter"]
idx_end = df.index[df.col1 == "End delimiter"]
rows2drop = []
for start, end in zip(idx_start, idx_end):
rows2drop.extend(np.arange(start, end 1))
df.drop(rows2drop, axis=0)
Initial dataframe
---- --------------------
| | col1 |
|---- --------------------|
| 0 | red |
| 1 | blue |
| 2 | Start delimiter |
| 3 | black |
| 4 | white |
| 5 | End delimiter |
| 6 | green |
| 7 | blue |
| 8 | Start delimiter |
| 9 | red |
| 10 | green |
| 11 | anothercolor |
| 12 | End delimiter |
| 13 | again_anothercolor |
---- --------------------
Dataframe after drop()
---- --------------------
| | col1 |
|---- --------------------|
| 0 | red |
| 1 | blue |
| 6 | green |
| 7 | blue |
| 13 | again_anothercolor |
---- --------------------
CodePudding user response:
try:
df
Col1
0 Green
1 Purple
2 Start delimiter
3 abchd
4 oeitms
5 End delimiter
6 Yellow
7 Start delimiter
8 kdkfkd
9 dldldldl
10 mdmdmdm
11 End delimiter
12 Red
13 Brown
14 Rose
15 White
a = False
b = []
for i in df['Col1'].tolist():
print(i)
if i == 'Start delimiter':
a = True
b.append(i)
elif i == 'End delimiter':
a = False
b.append(i)
elif a:
b.append(i)
#b --> ['Start delimiter', 'abchd', 'oeitms', 'End delimiter', 'Start delimiter', 'kdkfkd', 'dldldldl', 'mdmdmdm', 'End delimiter']
df = df[~df['Col1'].isin(b)]
df
Col1
0 Green
1 Purple
6 Yellow
12 Red
13 Brown
14 Rose
15 White
CodePudding user response:
Here's a function to do just that.
def cleanlist(list_to_clean,start_clean,end_clean) :
clean_list =[]
clean_status = False
for x in list_to_clean:
if x == start_clean :
clean_status=True
elif clean_status == False:
clean_list.append(x)
elif x == end_clean :
clean_status = False
return clean_list
print(cleanlist(list,'Start delimiter','End delimiter'))