I have strings of words and I want to find the frequency of each word group, print the words (doesn't matter if words appear multiple times), and the total frequency for each word group by each word.
PLEASE NOTE: In the solution, I don't want to use any loop like 'for' loop but arrive at same results.
For example, I have words as follows:
'abc'
'abc'
'abc'
'abc'
'xyz'
'xyz'
'tuf'
'pol'
'pol'
'pol'
'pol'
'pol'
'pol'
and need output as:
'abc', 4
'abc', 4
'abc', 4
'abc', 4
'xyz', 2
'xyz', 2
'tuf', 1
'pol', 6
'pol', 6
'pol', 6
'pol', 6
'pol', 6
'pol', 6
I am using python3 and I have tried this code and it doesn't work as expected:
curr_tk = None
tk = None
count = 0
for items in data:
line = items.strip()
file = line.split(",")
tk = file[0]
if curr_tk == tk:
count = 1
else:
if curr_tk:
print ('%s , %s' % (curr_tk, count))
count = 1
curr_tk = tk
#print last word
if curr_tk == tk:
print ('%s , %s' % (curr_tk,count))
The above code gives me output as:
'abc', 4
'xyz', 2
'tuf', 1
'pol', 6
CodePudding user response:
Using loop is unavoidable. But if you prefer not to see it, you can use pandas and let the package do the calculations in the background:
words = ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'tuf', 'pol', 'pol', 'pol', 'pol', 'pol', 'pol']
import pandas as pd
df = pd.DataFrame(words, columns=['words'])
df1 = pd.DataFrame(df.value_counts(), columns=['counts'])
df.join(df1, on='words', how='inner')
output:
words counts
0 abc 4
1 abc 4
2 abc 4
3 abc 4
4 xyz 2
5 xyz 2
6 tuf 1
7 pol 6
8 pol 6
9 pol 6
10 pol 6
11 pol 6
12 pol 6
CodePudding user response:
I probably understand what you want to do. You need to print the repeated strings, like 'abc', 4
for 4 times, but don't want to do this using a for
loop. I don't understand why you restrict yourself.
A method is to use a buffer for the output content. I provide two ways, controlled by boolean first_way
, to demonstrate this.
curr_tk = None
tk = None
count = 0
first_way = True
base_buffer = '{tk} , {count}\n'
output_buffer = ''
for items in data:
line = items.strip()
file = line.split(',')
tk = file[0]
if curr_tk == tk:
count = 1
if first_way:
output_buffer = base_buffer
else:
if curr_tk:
if not first_way: # use operator '*' to copy str
# I guess the underlying implementation is also a loop
# not sure whether this violates the requirement
output_buffer = base_buffer * count
print (output_buffer.format(tk=curr_tk, count=count), end='')
count = 1
curr_tk = tk
if first_way:
output_buffer = base_buffer
#print last word
if curr_tk == tk:
print ('%s , %s' % (curr_tk,count))
Giving data = ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'tuf']
, you will get the ouput:
abc , 4
abc , 4
abc , 4
abc , 4
xyz , 2
xyz , 2
tuf , 1
CodePudding user response:
I don't know if this will help but if you really don't want to use loop then don't use python at all use Sql. here's the code,
DECLARE @phrases TABLE (id int, phrase varchar(max)) INSERT @phrases values (1,'Red and White' ), (2,'green' ), (3,'White and blue' ), (4,'Blue' ), (5,'Dark blue' );
SELECT word, COUNT(*) c FROM @phrases CROSS APPLY (SELECT CAST('' REPLACE(phrase,' ','') '' AS xml) xml1 ) t1 CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2 GROUP BY word