Home > Software engineering >  How to separate words to single letters from text file python
How to separate words to single letters from text file python

Time:01-09

How do I separate words from a text file into single letters?

I'm given a text where I have to calculate the frequency of the letters in a text. However, I can't seem to figure out how I separate the words into single letters so I can count the unique elements and from there determine their frequency.

I apologize for not having the text in a text file, but the following text I'm given:

alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book,' thought alice without pictures or conversation?'

so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy- chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.

there was nothing so very remarkable in that; nor did alice think it so very much out of the way to hear the rabbit say to itself, `oh dear! oh dear! i shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the rabbit actually took a watch out of its waistcoat- pocket, and looked at it, and then hurried on, alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.

in another moment down went alice after it, never once considering how in the world she was to get out again.

the rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that alice had not a moment to think about stopping herself before she found herself falling down a very deep well.

I'm supposed to separate into getting 26 variables a-z, and then determine their frequency which is given as the following: The frequency for the different letters in the given text

I tried making the following code so far:

# Check where the current file you are working in, is saved. 
import os
os.getcwd()
#print(os.getcwd())

# 1. Change the current working directory to the place where you have saved the file.
os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles')
os.getcwd()
#print(os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles'))

# 2. Listing the content of current working directory type
os.listdir(os.getcwd())
#print(os.listdir(os.getcwd()))

#importing the file
filein = open("small_text.txt", "r") #opens the file for reading
lines = filein.readlines() #reads all lines into an array
smalltxt = "".join(lines) #Joins the lines into one big string.

import numpy as np

def letterFrequency(filename):
    #counts the frequency of letters in a text
    
    unique_elems, counts = np.unique(separate_words, return_counts=True)

    return unique_elems

I just don't know how to separate the letters in the text, so I can count the unique elements.

CodePudding user response:

You can use collections.Counter to get your frequencies directly from the text.

Then just select the 26 keys you are interested, because it will also include whitespaces and other signs.

from collections import Counter
[...]
with open("small_text.txt", "r") as file:
    text = file.read()

keys = "abcdefghijklmnopqrstuvwxyz"

c = Counter(text.lower())
# initialize occurrence with zeros to have all keys present.
occurrence = dict.fromkeys(keys, 0)
occurrence.update({k:v for k,v in c.items() if k in keys})
total = sum(occurrence.values())
frequency = {k:v/total for k,v in occurrence.items()}

[...]

To handle upper case str.lower might be useful as well.

CodePudding user response:

"how I separate the words into single letters" since you want to calculate the count of the characters you can implement python counter in collections.

For example

import collections
import pprint
...
...
file_input = input('File_Name: ')
with open(file_input, 'r') as info:
  count = collections.Counter(info.read().upper()) # reading file 
  value = pprint.pformat(count)
print(value)
...
...

This read your file will output the count of characters present.

  • Related