I'm wondering what would be the best way to delete lines from a tabular text (while keeping the header) so that only specific entries that contain a word are in the tabular format.
Say for example, I have a tabular text file with animals and their names and ages. (The headers are Animals/Names/Ages.) How could I delete all lines that do not have 'Dog' in the 'Animal' heading?
Animals Names Ages
Dog Pippin 10
Dog Merry 14
Dog Frodo 12
Cat Sauron 11
Bird Gandalf 10
Bird Mordor 12
and I only want: Animals Names Ages
Dog Pippin 10
Dog Merry 14
Dog Frodo 12
I have my example code below with notes:
import os
headers = 1
field1 = 'ANIMALS'
sep = ' '
def getIndex(delimString, delimiter, name):
'''Get position of item in a delimited string'''
delimString = delimString.strip()
lineList = delimString.split(delimiter)
index = lineList.index(name)
return index
infile = 'C:/example'
outfile = 'C:/folder/animals'
try:
with open(infile, 'r') as fin:
with open(outfile, 'w') as fout:
for i in range(headers):
line = fin.readline()
fout.write(line)
line = fin.readline()
fout.write(line)
# This is where I get confused, I try using the method below:
for line in fin:
lineList = line.split(sep)
# But the code doesn't work as it only prints the header
# I have a feeling it's the way I'm phrasing this area
if field1 == 'DOG':
fout.write(line)
print '{0} created.'.format(outfile)
except IOError:
print "{0} doesn't exist- send help".format(infile)
What is the best way to selectively print items on a tabular .txt file?
CodePudding user response:
Using stdin and stdout instead of files to simplify it (you can replace that with open if you want):
import sys
headers = 1
sep = ' '
fin = sys.stdin
fout = sys.stdout
for i in range(headers):
line = fin.readline()
fout.write(line)
for line in fin:
lineList = line.split(sep)
if lineList[0] == 'Dog':
fout.write(line)
and when you run this with:
python filter.py < input.txt
Animals Names Ages
Dog Pippin 10
Dog Merry 14
Dog Frodo 12
In other words, just don't print the stuff you don't want.
CodePudding user response:
Let's supose that it's a csv file, with this code you can return only the lines that has Dog as Animals value
import pandas as pd
df = pd.read_csv(file_name)
df.loc[df.Animals == 'Dog']
If you want to updante the file you can run df.to_csv(filename)
and it will replace the csv file that has the same filenam, otherwise it will create another csv file with the filename.
I hope that did help you.