For the data that I have, which looks like below (sample):
import pandas as pd
MyDict = {'text' : ['\nbla bla text \n\n bla bla another text \n # bla text \n\n\n bla bla another text',
'\nbla bla bla text2 \n\n\ bla bla bla another text it is \n\n # bla bla bla text \n bla bla it is another text']}
df = pd.DataFrame(MyDict)
I want to count the total number of lines in each cell of the column text
which are not empty ('\n') so that I should be able to get a dataframe which should look like this:
text total_lines
'bla bla text \n\n bla bla another text \n # bla text \n\n\n...' 4
'bla bla bla text2 \n\n\ bla bla bla another text it is \n\n...' 4
There are 4 non empty lines in cell 1 of column text
and 4 in cell 2 and so forth...
I tried to search over stackoverflow but could not find any relevant suggestions. Could someone help me on this?
CodePudding user response:
You could try something like this:
df['total_lines'] = df['text'].str.split('\n').apply(lambda x: len(x) - x.count(''))
Output:
text total_lines
0 \nbla bla text \n\n bla bla another text \n # ... 4
1 \nbla bla bla text2 \n\n\ bla bla bla another ... 4
Each string is split into lines and from number of all elements of the list we are subtracting values which are empty. It gives us number of non-empty values.