Home > Back-end >  How do I remove string from a previouse line in a txt file thats connected to a new line in a vector
How do I remove string from a previouse line in a txt file thats connected to a new line in a vector

Time:03-29

I'm trying to do an assignment where we're given a file of strings that contains the names of movies with their release dates and cast. Currently, I'm trying to separate the title of each movie from its cast, however whenever I run my code I get the title of a movie but a random cast member keeps appearing when that isn't supposed to happen. Does anyone know what the bug is?

The txt file is below:

txtfile

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include "vector.h" //you can also use #include <vector>

using namespace std;




//----------------------------------------------------//
//Lets get the text of the file into vectors
Vector<string> movieTitle(string txtfile)
{
  Vector<string> Title; //Title of the Movie
  fstream myFile;
  string word;
  int i = 0;
  myFile.open(txtfile);
  if(!myFile.good())
  {
    cout << "ERROR: FILE NOT FOUND" << endl;
    exit(1);
  }
  while(getline(myFile, word, '\t'))
  {
    Title.push_back(word);
    continue;
  }
  myFile.close();
  return Title;
}

int main()
{
  Vector<string> test;
  test = movieTitle("movies_mpaa.txt");
  cout << test[1] << endl;
  return 0;
}

Whenever I run this my output would be

Nela Wagman

Moon Knight (2022)

I'm trying to remove the Nela Wagman.

I'm just trying to remove the string that's connected to the movie title for some reason. The movie title is separated by a tab from the cast, but for some reason the cast from the previous movie list gets connected to the upcoming movie title. I'm trying to remove this.

CodePudding user response:

(Note - I am making the huuuge assumption that Vector.h is std::vector)

Well your output makes zero sense, your program read the entire file into that one vector called Title.

Vector<string> Title; //Title of the Movie

why would you have a vector to store the title

Anyway its not clear what you are trying to do. What do you expect the output of this program to be. As I said it reads the entoire file into that one vector. I am guessing that the output you show is the tail end of a huge piece of output

First thing you need to do in understand how to get one film out of that file. The layout of the file is as follows

 <title>\t<actor>\t<actor>\n

So I suggest that you go getline delimited by \n and then chop that line up delimited by tab (\t)

YOu see here that you are reading the whole file, chopped up by tab

  while(getline(myFile, word, '\t'))
  {
    Title.push_back(word);
    continue; <<<<======== not needed BTW
  } 

Start easy by doing this

 Vector<string> films; 
 fstream myFile;
 string word;
 myFile.open(txtfile);
 if(!myFile.good())
 {
    cout << "ERROR: FILE NOT FOUND" << endl;
     exit(1);
 }
 while(getline(myFile, word, '\n'))
 {
     films.push_back(word);
 }

now you will have the entire db in that vector. One entry per movvie

CodePudding user response:

You already have your answer from @pm100, but let me provide a few thoughts on why you are having difficulties in the first place as well as prevent some additional difficulties you will encounter in the future.

You have a two-part problem essentially:

  1. How do I parse (separate) the movie information from each line (record) in movies_mpaa.txt (there is information on 37215 films contained in the file); and
  2. How do I store that information so it is easily retrievable and reasonably efficiently stored.

The format for each record in the file is:

Title (year)\tCast Member1\tCast Member2\t....\n

So you have your title then a space and your (year) in parenthesis followed by a '\t' and then the names of the cast members separated by '\t' until the end of the record.

As with any delimited record, you read the entire line into a std::string and then create a std::stringstream from the entire line which you can then parse. Parsing directly from the file presents problems in numerous cases where reading until a delimiter will ignore the final '\n' and begin reading from the next record. By creating a std::stringstream when the end is reached .eof() is set ensuring you only operate on fields from that single record.

Splitting the first field holding the title / year can be done with .substr() member function by reading the title from the beginning until the .find_last_of(' '); (space) character. The year can be separated by .find_last_of('(') 1 and reading the next 4-characters.

For the cast members, you simply loop continually, isolating each cast-member name in turn and using .push_back to add that name to your vector of strings.

To keep the data manageable, using a struct for each movie hoding the separated title, year and then vector of cast makes sense. That single object can concisely contain the information on one movie.

Then a std::vector<movie> provides a simple way to create a vector containing all of the movies you have read from the data file. You can handle that any way you like, but a struct nested in a class makes things fairly straight forward and you can write a simple overload of >> to handle the input separation and storage, and another overload of << to output the details of each movie.

A Short Example

A short example putting those principles to work could have your movie struct as a private member of a films class where the combined data for all movies is held in a std::vector<movie> as a private member of the surrounding class. You pass the filename to read as the first argument to the program, and if you #define PRNMOVIES then the details of each stored movie is printed to stdout (don't do this for the full file, use as a test on a couple of records)

In either case the total number of movies stored in your films object is shown. It can be written in a number of different ways, here is but one:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

class films {
  
  private:
    struct movie {        /* struct to hold title, year, cast */
      std::string title;
      std::string year;
      std::vector<std::string> cast;
      
      /* overload of >> to read one movie from input into struct */
      friend std::istream& operator >> ( std::istream& is, movie& m ) {
        std::string record {};      /* string to hold line */
        
        if (getline (is, record)) {       /* read line */
          std::stringstream ss (record);  /* create stringstream fron line */
          std::string titleyear{};        /* string 1st field title/year */
          std::string member {};          /* string for one cast member */
          /* separate title and year */
          if (!getline (ss, titleyear, '\t')) {
            return is;
          }
          /* separate title on last space, year is 4-char after last '(' */
          m.title = titleyear.substr (0, titleyear.find_last_of(' '));
          m.year = titleyear.substr (titleyear.find_last_of('(') 1, 4);
          
          /* loop reading cast member and add to cast vector */
          while (getline (ss, member, '\t')) {
            m.cast.push_back(member);
          }
        }
        return is;
      }
      
      /* overload of << to output movie info */
      friend std::ostream& operator << ( std::ostream& os, const movie& m ) {
        os << "title : " << m.title << 
              "\nyear  : " << m.year  <<
              "\ncast  :\n";
        for (const auto& c : m.cast) {
          os << "        " << c << '\n';
        }
        os.put ('\n');
        return os;
      }
      
      /* get count of cast-members stored */
      size_t get_cast_count (movie& m) { return m.cast.size(); }
    };
    
    std::vector<movie> movies;    /* vector of movie for all films */
  
  public:
    films () { movies.clear(); }  /* default */
    films (std::istream& is) {    /* construct passing istream */
      while (1) {                 /* loop continually */
        movie m{};                /* temporary movie */
        if (!(is >> m)) {         /* read record into m, break on fail */
          break;
        }
        movies.push_back(m);      /* add movie to movies */
      }
    }
    /* getter - number of movies stored */
    size_t get_film_count() { return movies.size(); }
    
    void prn_films() {            /* test print (DON'T run on whole file) */
      for (const auto& f : movies) {
        std::cout << f;
      }
    }
};

int main (int argc, char **argv) {
  
  if (argc < 2) { /* validate one argument given for filename */
    std::cerr << "error: insufficient arguments.\n"
                 "usage: " << argv[0] << " filename\n";
    return 1;
  }
  
  std::ifstream f (argv[1]);  /* open file */
  if (! f.good()) {           /* validate file open for reading */
    std::cerr << "error: file open failed '" << argv[1] << "'.\n";
    return 1;
  }
  
  films mpaa (f);   /* construct films reading all records from stream */
  
  /* output number of movies stored */
  std::cout << "read " << mpaa.get_film_count() << " films.\n";
#ifdef PRNMOVIES
  mpaa.prn_films(); /* conditionally output details if PRNMOVIES defined */
#endif
}

Example Use/Output

Running the timed program without PRNMOVIES defined on your entire movies_mpaa.txt file shows all 37215 films can be read and separated in a little over 0.2 seconds, e.g.

$ time ./bin/moviempaa ~/tmp/movies_mpaa.txt
read 37215 films.

real    0m0.206s
user    0m0.180s
sys     0m0.026s

Checking the detail print of the output on 2 records from the file in a short subfile created with head -n2 movies_mpaa.txt > movies_mpaa2.txt allows you to define PRNMOVIES and keep the output to a hundred lines or so, e.g.

$ ./bin/moviempaa2 dat/movies_mpaa2.txt
read 2 films.
title : C.O.G.
year  : 2013
cast  :
        Danny Belrose
        Alexander Chapin-Plata
        Sean Ghazi
        Jonathan Groff
        Tommy Hestmark
        Louis Hobson
        Kamyar Jahan
        Simos Kalivas
        Timothy Levine
        Castillo Morales
        Eloy M?ndez
        Denis O'Hare
        Bob Olin
        Tim Patteron
        Vu Pham
        Diego Sanchez
        Zach Sanchez
        Brennan Sprecher
        Dean Stockwell
        Corey Stoll
        Tyron Strickland
        Jeremy Evan Taylor
        Lance Weldon
        Gloria Alvarez
        Lara Baker
        Katy Beckemeyer
        Troian Bellisario
        Kim Bissett
        Ellen Bloodworth
        Dale Dickey
        Beth Furumasu
        Keiko Green
        Julie Groff
        Teresa Wells Jones
        Karli Klein
        Katie Klein
        Blake Lindsley
        Marvella McPartland
        Dana Millican
        Jennifer Oswald
        Tyra Richards
        Jewel Robinson
        Asha Sawyer
        Cami Sturm
        Casey Wilson

title : Three Days of Rain
year  : 2002
cast  :
        Erick Avari
        Alimi Ballard
        Joey Bilow
        Bruce Bohne
        Robert Carradine
        Robert Casserly
        Chuck Cooper
        Keir Dullea
        Peter Falk
        Mark Feuerstein
        Peter Kalos
        George Kuchar
        Lyle Lovett
        John Carroll Lynch
        Don Meredith
        Jason Patric
        Max Perlich
        Wayne Rogers
        Michael Santoro
        Peter Henry Schroeder
        Bill Stockton
        Penelope Allen
        Laurie Coleman
        Blythe Danner
        Jordan Elliott
        Heather Kafka
        Christine Karl
        Merle Kennedy
        Claire Kirk
        Maggie Walker

Look things over and understand (1) the basic approach used to parse the information from each line and (2) how the nested struct movie allows the films class to create the std::vector<movie> movies; to hold all information for all movies. How you put the pieces together is up to you, this just shows one basic approach. Let me know if you have questions.

  •  Tags:  
  • c
  • Related