Home > Enterprise >  Insertion and Searching of a String in a Trie data structure
Insertion and Searching of a String in a Trie data structure

Time:10-10

I am currently learning about Trie Data Structure. I am looking for sample implementations in which there's a part the gets me confused.

Here is the sample code I found:

#include<bits/stdc  .h>
#include<iostream>
using namespace std;
const int alpha=26;
struct node{
    node* child[alpha];
   bool end;
   int countchild;
};
node* getnode(){
    node* pnode=new node;
    pnode->end=false;
    pnode->countchild=0;
    for(int i=0;i<alpha;i  ){
        pnode->child[i]=NULL;
    }
    return pnode;
}
void insert(node* root,string key){
    node* pcrawl=root;
    int size=key.length();
    for(int i=0;i<size;i  ){
        int index=key[i]-'a';
        if(!pcrawl->child[index]){
            pcrawl->child[index]=getnode();
        }
        pcrawl->countchild =1;
        pcrawl=pcrawl->child[index];
        
    }
    pcrawl->end=true;
}

Now, on the insert function, what does this line mean:

int index=key[i]-'a';

I don't know why is it specifically 'a' that needs to be subtracted. I always see that it's the character 'a'. Sorry for sounding kinda dumb on this but I couldn't just understand.

CodePudding user response:

Why and how does it work?

As you can see, the node uses a fixed-size array of pointers:

node* child[alpha];

where const int alpha=26;

This means that at each node, it stores the next node using an index between 0 and 25 in that array. key[i]-'a' computes this index, keeping in mind that key[i] is a char i.e. an integral type that stores the numeric value corresponding to the encoding of the letter.

Weakness of the program

This program assumes that:

  • all letters of the string parameter are lowercase letters between a and z
  • lowercase letters are contiguous in the encoding convention (fortunately this is true in all main encodings) and the order of the encoding corresponds to the alphabetical order.

Unfortunately, this program is UB is there are non-lowercase characters in the string parameter. A safer approach would be to use

int index=tolower(key[i])-'a';  // case insensitive version

and to ignore non alphabetic characters:

if (index<0 || index>=alpha) 
    continue;  

Here a first improvement (the two remarks above, use of vectors instead of arrays, and use of member functions instead of free functions.
A next improvement would be to use smart pointers to get rid of manual memory management, an d maybe even maps.

CodePudding user response:

I don't know why is it specifically 'a' that needs to be subtracted

Because everyone who copies this code doesn't know it either.

What you got is a trie that works for ASCII characters 'a' to 'z'. Why on Earth would anyone implementing example code add such an unnecessary and confusing restriction is largely beyond my understanding.

The code would be simpler and more understandable without this restriction, and if it were plain C. As it stands, the code is basically "C-with-iostreams". Tries in C wouldn't look this way, because C is not C. Container in C are expected to be iterable, and to reasonably support standard algorithms, etc.

Yes, you can write C like it was C. But you'd do that mainly when you don't really understand C , and at that point it makes much more sense to stick to C and don't pretend otherwise. Writing good C code is a skill in itself. I'd argue that there isn't a simple "transition path" from C to C : they are different languages, and if you apply C idioms to C code, you'll be writing code that's unnecessarily inscrutable.

This code you found is one more case of blind leading the blind when it comes to search terms that are highly sought after for some coding "challenge" sites :/

  • Related