Home > Software design >  calculate the length of string by the function of strlen
calculate the length of string by the function of strlen

Time:09-16

  char arr[]={'a','b','c'};
  int len=strlen(arr);

I know that when the pointer of char meet the address of '0', this function would stop running and return the length between the array's first address and the address of '0'. But when I created one string by that way, I didn't put '0'. So the pointer of char maybe keep moving to find the address of '0'. In this process , the pointer maybe made a error about out-of-bounds access. So why this code didn't make warn to me or why this code didn't make error?

CodePudding user response:

strlen() only works correctly for zero-terminated character arrays, and what you have is not one.

What len returns for your program is entirely dependent on what happens to be in memory after the address arr 3.

If there's a zero there, then you'll get 3. If there's other data before a zero, then you'll get another number. If you're unlucky and there's no zero (in your process's memory space), your program will crash with an out-of-bounds read.

For instance, the program

#include <stdio.h>
#include <string.h>

int main(void) {
  char blarr[] = {'d', 'e', 'f'};
  char arr[] = {'a', 'b', 'c'};
  int len = strlen(arr);
  printf("%d\n", len);
  return 0;
}

may print 6, depending on how the compiler allocates arr and blarr on stack.

Your compiler doesn't warn about anything, because your program is technically correct – you're passing in a char* to strlen, that's fine – but it's not smart enough to detect that that char* isn't a zero-terminated string.

CodePudding user response:

So the pointer of char maybe keep moving to find the address of '0'.In this process , the pointer maybe made a error about out-of-bounds access.

Yes, that's exactly what happened.

So why this code didn't make warn to me or why this code didn't make error?

Because the declaration

char arr[] = {'a','b','c'};

is perfectly valid. You haven't given the compiler any indication that you intend to use arr as a string.

A somewhat more interesting case is if you were to write

char arr[3] = "abc";

Due to a historical quirk, this is perfectly legal C, although it creates exactly the same array arr and will have exactly the same problem if you pass it to to strlen. Here, though, I believe some compilers will warn, and it would certainly be an appropriate warning, since the feature is debatable, and rarely deliberately used.

CodePudding user response:

Often times, it is about managing expectations.

Let's start with a small thought experiment (or time travel back to the early days of computing), where there are no programming languages - just machine code. There, you would (with CPU specific instructions) write something like this to represent a string:

arr: db 'a','b','c'
strlen:                         ; RDI (pointer to string) -> RAX (length of string)
                                ; RAX length counter and return value
                                ; CL used for null character test
        xor RAX, RAX            ; set RAX to 0
strlen_loop:
        mov cl, [rdi]           ; load CL with the byte pointed to by argument
        test cl,cl
        jz strlen_loop_done
        inc rdi                 ; look at next byte in argument
        inc rax                 ; increment the length counter
        jmp strlen_loop
strlen_loop_done:
        ret                     ; rax contains a zero terminated strings length

Compared to that, writing the same function in C is much simpler.

  • We do not have to care about register allotment (which register does what).
  • We do not rely on the instruction set of a specific CPU
  • We do not have to look up the "calling conventions" or ABI for the target system (argument passing conventions etc)
size_t strlen(const char* s) {
  size_t l = 0;
  while (*s) {
    l  ;
    s  ;
  }
  return l;
}

The convention, that "strings" are just pointers to chars (bytes) with the null value terminator is admittedly quite arbitrary but "comes" with the C programming language. It is just a convention. The compiler itself knows nothing about it (oh well it does know to add a terminating null on string literals). But when calling strlen() it cannot distinguish the string case from the just a byte array case. Why? because there is no specific string type.

As such, it is just about as clever as the assembler code version I gave above. It relies on the "c-string-convention". The assembler does not check, nor does the C compiler, because - let's be honest, C's main accomplishments are the bullet items I gave above.

So if you manage your expectations, about the language C, think of it as: A slightly abstracted version of a glorified assembly language.

If you are annoyed about the c-string convention (after all, strlen is O(n) in time complexity), you can still come up with your own string type, maybe so:

typedef struct String_tag {
  size_t length;
  char data[];
} String_t;

And write yourself helpers (to create a string on the heap) and macros (to create a string on the stack with alloca or something). And write your own string feature library around that type.

If you are just getting started with C, instead of tackling something bigger, I think this would be a good exercise for learning the language.

  •  Tags:  
  • c
  • Related