I have a pandas datafrme with a text column and was wondering how can I count the number of line breaks.This is how it's done in excel and would like to now how I can achieve this in python:
How To Count Number Of Lines (Line Breaks) In A Cell In Excel?
CodePudding user response:
You can try Series.str.count
df['count'] = df['A'].str.count('\n')
print(df)
A count
0 a\nb 1
1 c\nd\nf 2
CodePudding user response:
You can use str.count
:
df['Count lines'] = df['Data'].str.count('\n').add(1)
print(df)
# Output
Data Count lines
0 line1 1
1 new\nnew line1\nnew line2\nnew line3 4
2 AA\nBB line\nCC line 3
Alternative with str.split
:
df['Count lines'] = df['Data'].str.split('\n').str.len()
print(df)
# Output
Data Count lines
0 line1 1
1 new\nnew line1\nnew line2\nnew line3 4
2 AA\nBB line\nCC line 3
You can remove add(1)
if you want to count the number of line breaks and not the number of lines. In this case, for the str.split
version, you have to append .sub(1)
to get the number of line breaks.
CodePudding user response:
You can apply a lambda function. Please see the following example.
import pandas as pd
data = {
"name": ["Uditha ", "this is \n Fun", "what\n on \nearth\n", "life", "ane palayan ban"],
"age": [10, 20, 30, 40, 50],
}
df = pd.DataFrame(data)
df['new'] = df['name'].apply(lambda x: len(x.split("\n")))
print(df)
CodePudding user response:
IIUC, simply use str.count
:
Number of line breaks:
df['breaks'] = df['your_col'].str.count('\n')
Number of lines (empty or not):
df['lines'] = df['your_col'].str.count('\n').add(1)
# or
df['lines'] = df['your_col'].str.count(r'(\n|$)')
To count only non-empty lines:
df['non-empty lines'] = df['your_col'].str.count(r'[^\n](\n|$)')
Example:
your_col breaks lines non-empty lines
0 abc\ndef 1 2 2
1 abc 0 1 1
2 0 1 0
3 \n 1 2 0
Used input:
df = pd.DataFrame({'your_col': ['abc\ndef', 'abc', '', '\n']})