Home > OS >  Linux running on CRF, 0.58 test text is too big, no model files are generated
Linux running on CRF, 0.58 test text is too big, no model files are generated

Time:10-04

As title, text has 24 m, nearly 1.8 million the amount of data, run the command: crf_learn - f 8-1.5 c template msr_training. 01. After the CRF. TXT model_file
will appear this kind of circumstance, also not generating model, in the face of 1 m size of file is basically no problem, have bosses is to explain why, for me, and using the Ubuntu14.04, memory is 4 G

CodePudding user response:

Under the same problem, win, there is also a model file

CodePudding user response:

. 14700.. 14800.. 14900.. 15000.. 15100.. 15200.. 15300.. 15400.. 15500.. 15600.. 15700.. 15800.. 15900.. 16000.. 16100.. 16200.. 16300.. 16400.. 16500.. 16600.. 16700.. 16800.. 16900.. 17000.. 17100.. 17200.. 17300.. 17400.. 17500.. 17600.. 17700.. 17800.. 17900.. 18000.. 18100.. 18200.. 18300.. 18400.. 18500.. 18600.. 18700.. 18800.. 18900.. 19000..
Done! 41.93 s

The Number of sentences: 19054
The Number of the features: 2159868
The Number of threads (s) : 1
Freq: 3
Eta: 0.00010
C: 4.00000
Shrinking the size: 20
Iter=0 terr=0.67958 serr=1.00000 act=2159868 obj=2531994.56328 diff=1.00000

CodePudding user response:

I have been to the building of the same mistakes, my solution can consult
Number of sentences: 1. There should be a corpus no handle,
The number should not be 1, between sentence and sentence should also separated by a newline
Error is not the same as let's you that is on the second floor of the iteration exit after the round, online there are posts that the solution is to add parameter

CodePudding user response:

CPU is not enough to use, to the server,

CodePudding user response:

Should be on the second floor of the characteristics of too many, the -f parameter improve, reduce the characteristic number is ok

CodePudding user response:

I also, more than a dozen trillion don't file an iteration, pick a hundreds of lines

CodePudding user response:

, I use personal computers to control the corpus only 10 w, two features, the key is he didn't throw out what error message
  • Related