Home > other >  How to check with ruby if a word is repeated twice in a file
How to check with ruby if a word is repeated twice in a file

Time:10-17

I have a large file, and I want to be able to check if a word is present twice.

puts "Enter a word: "
$word = gets.chomp

if File.read('worldcountry.txt') # do something if the word entered is present twice...

How can i check if the file worldcountry.txt include twice the $word i entered ?

CodePudding user response:

I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby

On the Gerry post with this code

word_count = 0
my_word = "input"

File.open("texte.txt", "r") do |f|
  f.each_line do |line|
    line.split(' ').each do |word|
      word_count  = 1 if word == my_word
    end
  end
end

puts "\n"   word_count.to_s

Thanks, i will pay more attention next time.

CodePudding user response:

If the file is not overly large, it can be gulped into a string. Suppose:

str = File.read('cat')
  #=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.

Suppose the given word is 'dog'.

Confirm the file contains at least two instances of the given word

One can attempt to match the regular expression

r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
  #=> true

Demo

Confirm the file contains exactly two instances of the given word

Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let

r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
  #=> false

Demo


The two regular expressions can be written in free-spacing mode to make them self-documenting.

r1 = /
     \bdog\b       # match 'dog' surrounded by word breaks  
     .*            # match zero or more characters
     \bdog\b       # match 'dog' surrounded by word breaks
     /m            # cause . to match newlines
r2 = /
     \A            # match beginning of string
     (?:           # begin non-capture group
       (?:         # begin non-capture group
         .         # match one character
         (?!       # begin negative lookahead
           \bdog\b # match 'dog' surrounded by word breaks
         )         # end negative lookahead
       )           # end non-capture group
       *           # execute preceding non-capture group zero or more times
       \bdog\b     # match 'dog' surrounded by word breaks
     )             # end non-capture group
     {2}           # execute preceding non-capture group twice
     (?!           # begin negative lookahead
       .*          # match zero or more characters
       \bdog\b     # match 'dog' surrounded by word breaks
     )             # end negative lookahead
     /xm           # # cause . to match newlines and invoke free-spacing mode
  •  Tags:  
  • ruby
  • Related