I have a text file and i want to remove the specific lines. Example :
first line ---> timestamp1
second line ---> Name of the person1
third line
first line ---> timestamp2
second line ---> Name of the person2
third line
first line ---> timestamp3
second line ---> Name of the person3
third line
I want to remove the first line and second line from the complete text file, as the first line & second line follows similar pattern.
I am able to remove the first line which is a time stamp at one go (below is the code)but want to remove the second line too at one go. Appreciate for the help.
First line remove code:
#Text(it is the text) loaded in dataframe( df) and then removed the first line:
df = df[~df['Text'].astype(str).str.startswith('0')]
CodePudding user response:
If need remove next row after match condition chain by &
for bitwise AND
mask by Series.shift
:
#by sample data
m = df['Text'].astype(str).str.startswith('timestamp')
df = df[~m & ~m.shift(fill_value=False)]
print (df)
Text
2 third line
5 third line
8 third line
Your code:
m = df['Text'].astype(str).str.startswith('0')
df = df[~m & ~m.shift(fill_value=False)]
CodePudding user response:
Maybe you can preprocess the text before create your dataframe:
data = []
with open('data.txt') as fp:
for line in fp:
line = line.strip()
if line.startswith('0'):
next(fp) # skip name of person
elif line:
data.append(line)
df = pd.DataFrame(data, columns=['Text'])
Output:
>>> df
Text
0 Text 1
1 Text 2
2 Text 3
Input file:
>>>