Home > Blockchain >  How to iterate through rows which contains text and create bigrams using python
How to iterate through rows which contains text and create bigrams using python

Time:02-03

In an excel file I have 5 columns and 20 rows, out of which one row contains text data as shown below df['Content'] row contains:

0 this is the final call
1 hello how are you doing 
2 this is me please say hi
..
.. and so on

I want to create bigrams while it remains attached to its original table.

I tried applying the below function to iterate through rows

def find_bigrams(input_list):
    bigram_list = []
    for i in range(len(input_list)-1):
        bigram_list.append(input_list[1:])
        return bigram_list

And tried applying back the row into its table using the:

df['Content'] = df['Content'].apply(find_bigrams)

But I am getting the following error:

0     None
1     None
2     None

I am expecting the output as below

   Company  Code      Content
0  xyz      uh-11     (this,is),(is,the),(the,final),(final,call)
1  abc      yh-21     (hello,how),(how,are),(are,you),(you,doing)

CodePudding user response:

Your input_list is not actually a list, it's a string.

Try the function below:

def find_bigrams(input_text):
    input_list = input_text.split(" ")
    bigram_list = list(map(','.join, zip(input_list[:-1], input_list[1:])))
    return bigram_list

CodePudding user response:

You can use itertools.permutations()

s.str.split().map(lambda x: list(itertools.permutations(x,2))[::len(x)])
  • Related