I am trying to calculate the number of strings in a column with length of 5 or more. These strings are in a column separated by comma.
df = pd.DataFrame(columns=['first'])
df['first'] = ['jack,utah,TOMHAWK Somer,SORITNO','jill','bob,texas','matt,AR','john']
Code I have used till now but not creating a new column with counts of strings of more than 5 characters.
df['countStrings'] = df['first'].str.split(',').count(r'[a-zA-Z0-9]{5,}')
CodePudding user response:
Pandas str.len() method is used to determine length of each string in a Pandas series. This method is only for series of strings. Since this is a string method, .str has to be prefixed everytime before calling this method.
Yo can try this :
import pandas as pd
df = pd.DataFrame(columns=['first'])
df['first'] = ['jack,utah,TOMHAWK
Somer,SORITNO','jill','bob,texas','matt,AR','john']
df['first'].replace(',',' ', regex=True, inplace=True)
df['first'].str.count(r'\w ').sum()
CodePudding user response:
This is how i would try to get the number of strings with len>=5 in a column:
data=[i for k in df['first']
for i in k.split(',')
if len(i)>=5]
result=len(data)