Home > Enterprise >  Have hard time understanding how to think correctly about 2 dimensional data structures rows and col
Have hard time understanding how to think correctly about 2 dimensional data structures rows and col

Time:10-17

I really have hard time understanding multidimensional data structures.It seems to be omnipresent concept in cs and understanding this right seems very important.So i hope to gain better understanding, here is what causes confusion in my understanding.

Im currently learning about about two dimensional arrays/lists.And for instance outer structures/list is said to be indicated via first dimension most often rows but overall the first dimension represent outer list and its elements.And then for each element of outer array/list since its multidimensional second dimension is used to represent elements of each list in the list making up overall outer list.In other words elements of each subset making up overall outer set.

Therefore if my understanding is right one dimension is needed to organize outer structure/list and second dimension is used to organize each of sub structures/lists making up overall list.

But at this point multidimensional structures already confuse me.

In most of these structures rows representing outer structure are actually just columns already belonging to second dimension. How does then first dimension being used to represent outer structure even apply here?

If we work with one dimensional array since we are working with one dimension its obvious that previous logic works for indicating its structure. Since its one dimensional each of its instances represent same structure.

And since we need to use dimensions to organize multi dimensional arrays and indicate which is outer set and which is subset. When we declare array like this to organize dictionary it would make sense to declare it like this:

string dictionary [26][NUMBER_OF_WORDS];

Here previous logic of one dimension indicating outer structure and second dimension inner structure makes logical sense. First we declare array similar to one dimensional since it should be in one dimension of 26 elements (rows) and then each of elements present at said location (column) represent elements of subarrays/subsets making up overall array/set. Indices are therefore rows and elements present at these indices columns. Which is consistent with previous mental model and logic.

But this is not what happens most of the time. Most of the time multidimensional structure implemented via arrays would be written like this (if for instance we want to write unemployment data for each year ):

int years [YEAR][MONTH]

Here if we follow previous logic each YEAR element represents outer set grouping together inner sets.And each of values present at each MONTH location (columns) represent elements of each subset.

But what actually happens is that from examples i saw written by more experienced programmers that confuse me written each of YEAR elements will be filled up like thisdata is written like this:

years[0][1]= 2006;
years[1][1]= 2007;

If i tried to work on data structures like this with previous mental model it wouldn't be possible.Now everything i learned about one dimension representing elements of outer list/array and second dimension elements of inner lists/arrays doesn't seem consistent.

If look at each one dimensional array making up this data structure it would be like saying that all elements after first belong to second dimension which is really confusing and inconsistent with logic i learned about two dimensional structures/arrays up until this point.

But when it comes to hash tables initial logic does work and is especially important if each location of hash table (if we want to store a dictionary) is linked list.

Due to this i cant seem to build right mental model in order to properly work with complex data structures.What am i missing in order to build consistent mental model and work with them?

CodePudding user response:

You can consider each multidimensional array as an array elements of which are in turn arrays.

For example consider the following declaration

int a[m][n];

where m and n some positive values.

You can rewrite this declaration using typedef the following way

typedef int T[n];

In this case the above declaration will look like

T a[m];

That is you have an array with m elements that have the type int[n].

So expressions like this a[i] yields an element of the type T that is an alias for the type int[n].

Applying the second subscript operator like a[i][j] you will get the i-th element of the two-dimensional array and within this element you will get the j-th element of the i-th one-dimensional array.

CodePudding user response:

You need to separate the higher level concept of some 2D table with columns and rows from the concept of allocating a 2D array in C. There are several ways you could implement a 2D table and a 2D array is one of them.

One thing to keep in mind is that C is a very low level language. When you declare a 2D array you actually declare an array of arrays and the dimensions specify how the arrays are laid out in memory, nothing else.

So for example int arr[2][3] = { {1,2,3}, {4,5,6} }; would allocate the items in memory like this:

Address  Raw data  Array item
      0         1      [0][0]
      1         2      [0][1]
      2         3      [0][2]
      3         4      [1][0]
      4         5      [1][1]
      5         6      [1][2]

The dimensions only dictate how the items should be laid out in memory, not their meaning on the application layer. As we can see, the inner-most dimension (to the far right, [3] in this case) has its items allocated adjacently, while any outer dimensions will have gaps between items.

Then on the application layer we can chose to let these number represent int arr[col][row] or int arr[row][col], as we please. Rows and columns have no special meaning to C other than telling it how many items to make room for. They will be allocated as per the above no matter.

As for mental model, you could think of it as one big spread sheet, where you can (re)name the X and Y axis whatever you like.


(Newbies may stop reading here)

An advanced topic related to this, is that it's more efficient to iterate across the inner-most dimension that the outer ones. So we should always write loops like

// GOOD
for(int i=0; i<rows; i  )
  for(int j=0; j<cols; j  )
    arr[i][j] = something();

And not:

// BAD
for(int j=0; j<cols; j  )
  for(int i=0; i<rows; i  )
    arr[i][j] = something();

The reasons why is efficient data cache usage on high-end CPUs like x86. When accessing data adjacently 1 memory address at a time, the CPU can pre-fetch more relevant items to the cache lines in advance, that it would be able to do if we tell it to only access a multiple of 3 address. In case the pre-fetch array won't fit inside a cache line, we get a so-called "cache" miss where the CPU is forced to read the data from RAM instead of cache, which is slower.

This is a big topic of its own. However, a good rule of thumb is that if you write code as readable as possible, it will also perform the fastest, at least data cache-wise. I think most would agree that the "GOOD" example above is more readable and sensible than the "BAD" one.

  • Related