I am writing a program that concurrently reads word by word from a text file and calculates word and its number of occurrences using channels and worker pool pattern
The program works in the following flow:
- Read a file from a text file (readText function)
- ReadText function sends each word to the word channel
- Each goroutine execute countWord function that counts word in a map
- Each goroutine returns a map and the worker function pass the Result value of struct to the ResultC channel
- Test function create a map based on result value coming from the resultC channel
- Prints map created from step 5
The program works, but when I try to put fmt.println(0) to see the process as shown below
func computeTotal() {
i := 0
for e := range resultC {
total[e.word] = e.count
i = 1
fmt.Println(i)
}
}
The program terminates without showing/counting all the words
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 all goroutines finished 16 17 18 map[but:1 cat's:1 crouched:1 fur:1 he:2 imperturbable:1 it:1 pointed:1 sat:1 snow:1 stiffly:1 the:1 was:2 with:1] total words: 27 38 ... 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 Time taken for reading the book 5.8145ms
The program shows results correctly if I uncomment the fmt.println() in the compute Totla function statements here and the output is as shown below
all goroutines finished map[a:83 about:4 above:2 absolute:1 accepted:1 across:1 affection:1 after:1 again:5 wonder:2 wood:5 wooded:1 woody:1 work:1 worked:2 world:4 would:11 wrapped:1 wrong:1 yellow:2 yielded:1 yielding:1 counts continues ......] total words: 856 Time taken for reading the book 5.9924ms
here is my implementation of readtext
//ensure close words at the right timing
func readText() {
file, err := os.Open(FILENAME)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
word := strings.ToLower(scanner.Text())
words <- strings.Trim(word, ".,:;")
}
//time.Sleep(1 * time.Second)
close(words)
}
here is my count word implementation using worker pool
//call countWord func,
func workerPool() {
var wg sync.WaitGroup
for i := 1; i <= NUMOFWORKER; i {
wg.Add(1)
go worker(&wg)
}
wg.Wait()
fmt.Println("all goroutines finished")
close(resultC)
}
func worker(wg *sync.WaitGroup) {
var tempMap = make(map[string]int)
for w := range words {
resultC <- countWord(w, tempMap) //retuns Result value
}
wg.Done()
}
//creates a map each word
func countWord(word string, tempMap map[string]int) Result {
_, ok := tempMap[word]
if ok {
tempMap[word]
return Result{word, tempMap[word] 1}
}
return Result{word, 1}
}
Finally, this is the main function
const FILENAME = "cat.txt"
const BUFFERSIZE = 3000
const NUMOFWORKER = 5
var words = make(chan string, BUFFERSIZE) //job
var resultC = make(chan Result, BUFFERSIZE)
var total = map[string]int{}
type Result struct {
word string
count int
}
func main() {
startTime := time.Now()
go readText()
go computeTotal()
workerPool() //blocking
fmt.Println(total)
endTime := time.Now()
timeTaken := endTime.Sub(startTime)
fmt.Println("total words: ", len(total))
fmt.Println("Time taken for reading the book", timeTaken)
}
I have been looking for why the program does not show the consistant result but I could not figure it out yet. How can I make a chnage to the program so that it produce the same outcome?
CodePudding user response:
The countWord
function always returns a result with count == 1.
Here's a version of the function that increments the count:
func countWord(word string, tempMap map[string]int) Result {
count := tempMap[word] 1
tempMap[word] = count
return Result{word, count}
}
CodePudding user response:
You are closing your Word channel as soon as your read operation is complete, it doesn't guarantee that all the data on that channel has been consumed by the consumer. Can you check if you are closing the channel a bit too early?