I have a list of text like this- and I want to count the number of lines in this. I thought of splitting it by ". "
, but it will create extra lines for words like "a.m."
. Please help!
[
'when people hear ai they often think about sentient robots and magic boxes. ai today is much more mundane and simple—but that doesn’t mean it’s not powerful. another misconception is that high-profile research projects can be applied directly to any business situation. ai done right can create an extreme return on investments (rois)—for instance through automation or precise prediction. but it does take thought, time, and proper implementation. we have seen that success and value generated by ai projects are increased when there is a grounded understanding and expectation of what the technology can deliver from the c-suite down.',
'“artificial intelligence (ai) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason and take action.”3 lately there has been a big rise in the day-to-day use of machines powered by ai. these machines are wired using cross-disciplinary approaches based on mathematics, computer science, statistics, psychology, and more.4 virtual assistants are becoming more common, most of the web shops predict your purchases, many companies make use of chatbots in their customer service and many companies use algorithms to detect fraud.',
'ai and deep learning technology employed in office entry systems will bring proper time tracking of each employee. as this system tries to learn each person with an image processing technology whose data is feed forwarded to a deep learning model where deep learning isn’t an algorithm per se, but rather a family of algorithms that implements deep networks (many layers). these networks are so deep that new methods of computation, such as graphics processing units (gpus), are required to train them, in addition to clusters of compute nodes. so using deep learning we can take detect the employee using face and person recognition scan and through which login, logout timing is recorded. using an ai system we can even identify each employee’s entry time, their working hours, non-working hours by tracking the movement of an employee in the office so that system can predict and report hr for the salary for each employee based on their working hours. our system can take feed from cctv to track movements of employees and this system is capable of recognizing a person even he/she is being masked as in this pandemic situation by taking their iris scan. with this system installed inside the office, the following are some of the benefits:',
'for several countries, regulations insist that the employer must keep documents available that can demonstrate the working hours performed by each employee. in the event of control from the labor inspectorate or a dispute with an employee, the employer must be able to explain and justify the working hours for the company. this can be made easy as our system is tracking employee movements',
'this is about monitoring user connection times to detect suspicious access times. in the event where compromised credentials are used to log on at 3 a.m. on a saturday, a notification on this access could alert the it team that an attack is possibly underway.',
'to manage and react to employees’ attendance, overtime thresholds, productivity, and suspicious access times, our system records and stores detailed and interactive reporting on users’ connection times. these records allow you to better manage users’ connection times and provide accurate, detailed data required by management.',
'4)if you want to avoid paying overtime, make sure that your employees respect certain working time quotas or even avoid suspicious access. our system will alert the hr officer about each employee’s office in and out time so that they can accordingly take action.',
'5)last but not least it reduces human resource needs to keep track of the records and sending the report to hr and hr officials has to check through the report so this system will reduce times and human resource needs',
'with the use of ai and deep learning technologies, we can automate some routines stuff with more functionality which humans need more resources to keep track thereby reducing time spent on manual data entry works rather companies can think of making their position high in the competitive world.'
]
CodePudding user response:
I have a list of text like this- and I want to count no. of lines in this. I thought of splitting it by". "
Nice idea, but if you want to count the number of lines, why don't you simply take the lenght
of that list
?
lenght = len(that_long_text_list)
And then you can sum this lenght
to the .
, removing by regex
expressions like a.m.
. To do this check this
question about regex pattern matching for acronyms.
CodePudding user response:
Sentence splitting is not a trivial task. I'd suggest you use a ready-made library like the NLTK.
import nltk
text = "..." # your raw text
sentences = nltk.sent_tokenize(text)
This works fairly well, but don't expect perfect results.
CodePudding user response:
Since we don't know how many characters fit into a line as you define it, we can't know for sure. My suggestion would be as follows:
- Split the text into paragraphs, since the start of a new paragraph will, in any case, introduce a line break. This is easy - each item in your list will be a paragraph.
- For each of these paragraphs, divide it by the number of characters in a line and round up to the nearest full integer.
So my code would be as follows:
raw_text = # Your raw text in the form of a list
line_count = 0
chars_per_line = # Whatever the number of characters in a line of yours is
for paragraph in raw_text:
line_count = (paragraph // chars_per_line) 1
Not perfect because it doesn't take the rules of syllabification into account, but close enough, I guess.