In python, how can you delete lines in a tabular text format that do NOT contain a specific word?-CodePudding

I'm wondering what would be the best way to delete lines from a tabular text (while keeping the header) so that only specific entries that contain a word are in the tabular format.

Say for example, I have a tabular text file with animals and their names and ages. (The headers are Animals/Names/Ages.) How could I delete all lines that do not have 'Dog' in the 'Animal' heading?

Animals Names Ages

Dog Pippin 10

Dog Merry 14

Dog Frodo 12

Cat Sauron 11

Bird Gandalf 10

Bird Mordor 12

and I only want: Animals Names Ages

Dog Pippin 10

Dog Merry 14

Dog Frodo 12

I have my example code below with notes:

import os
headers = 1
field1 = 'ANIMALS'
sep = ' '

def getIndex(delimString, delimiter, name):
    '''Get position of item in a delimited string'''
    delimString = delimString.strip()
    lineList = delimString.split(delimiter)
    index = lineList.index(name)
    return index

infile = 'C:/example'
outfile = 'C:/folder/animals'

try:
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            for i in range(headers):
                line = fin.readline()
                fout.write(line)
            line = fin.readline()
            fout.write(line)

            # This is where I get confused, I try using the method below:
            for line in fin:
                lineList = line.split(sep)
                # But the code doesn't work as it only prints the header
                # I have a feeling it's the way I'm phrasing this area
                if field1 == 'DOG':
                    fout.write(line)
            print '{0} created.'.format(outfile)

except IOError:
    print "{0} doesn't exist- send help".format(infile)

What is the best way to selectively print items on a tabular .txt file?

CodePudding user response：

Using stdin and stdout instead of files to simplify it (you can replace that with open if you want):

import sys

headers = 1
sep = ' '
fin = sys.stdin
fout = sys.stdout
for i in range(headers):
    line = fin.readline()
    fout.write(line)
for line in fin:
    lineList = line.split(sep)
    if lineList[0] == 'Dog':
        fout.write(line)

and when you run this with:

python filter.py < input.txt
Animals Names Ages
Dog Pippin 10
Dog Merry 14
Dog Frodo 12

In other words, just don't print the stuff you don't want.

CodePudding user response：

Let's supose that it's a csv file, with this code you can return only the lines that has Dog as Animals value

import pandas as pd

df = pd.read_csv(file_name)

df.loc[df.Animals == 'Dog']

If you want to updante the file you can run df.to_csv(filename) and it will replace the csv file that has the same filenam, otherwise it will create another csv file with the filename.

I hope that did help you.