Home > Back-end >  Python|Pandas Dataframe | How to remove the lines from a text following a format
Python|Pandas Dataframe | How to remove the lines from a text following a format

Time:06-22

I have a text file and i want to remove the specific lines. Example :

first line ---> timestamp1
second line ---> Name of the person1
third line

first line ---> timestamp2
second line ---> Name of the person2
third line

first line ---> timestamp3
second line ---> Name of the person3
third line

I want to remove the first line and second line from the complete text file, as the first line & second line follows similar pattern.

I am able to remove the first line which is a time stamp at one go (below is the code)but want to remove the second line too at one go. Appreciate for the help.

First line remove code:

#Text(it is the text) loaded in dataframe( df) and then removed the first line:

df = df[~df['Text'].astype(str).str.startswith('0')]

CodePudding user response:

If need remove next row after match condition chain by & for bitwise AND mask by Series.shift:

#by sample data
m = df['Text'].astype(str).str.startswith('timestamp')
df = df[~m & ~m.shift(fill_value=False)]

print (df)

         Text
2  third line
5  third line
8  third line

Your code:

m = df['Text'].astype(str).str.startswith('0')
df = df[~m & ~m.shift(fill_value=False)]

CodePudding user response:

Maybe you can preprocess the text before create your dataframe:

data = []

with open('data.txt') as fp:
    for line in fp:
        line = line.strip()
        if line.startswith('0'):
            next(fp)  # skip name of person
        elif line:
            data.append(line)
df = pd.DataFrame(data, columns=['Text'])

Output:

>>> df
     Text
0  Text 1
1  Text 2
2  Text 3

Input file:

>>>            
  • Related