Home > front end >  how to count number of lines, within a cell in pandas dataframe, which are not empty lines
how to count number of lines, within a cell in pandas dataframe, which are not empty lines

Time:05-20

For the data that I have, which looks like below (sample):

import pandas as pd

MyDict = {'text' : ['\nbla bla text \n\n bla bla another text \n # bla text \n\n\n bla bla another text', 
                    '\nbla bla bla text2 \n\n\ bla bla bla another text it is \n\n  # bla bla bla text \n bla bla it is another text']}

df = pd.DataFrame(MyDict)

I want to count the total number of lines in each cell of the column text which are not empty ('\n') so that I should be able to get a dataframe which should look like this:

text                                                                  total_lines
'bla bla text \n\n bla bla another text \n # bla text \n\n\n...'      4
'bla bla bla text2 \n\n\ bla bla bla another text it is \n\n...'      4

There are 4 non empty lines in cell 1 of column text and 4 in cell 2 and so forth...

I tried to search over stackoverflow but could not find any relevant suggestions. Could someone help me on this?

CodePudding user response:

You could try something like this:

df['total_lines'] = df['text'].str.split('\n').apply(lambda x: len(x) - x.count(''))

Output:

                                                 text   total_lines
0   \nbla bla text \n\n bla bla another text \n # ...             4
1   \nbla bla bla text2 \n\n\ bla bla bla another ...             4

Each string is split into lines and from number of all elements of the list we are subtracting values which are empty. It gives us number of non-empty values.

  • Related