Data Structure and Algorithm at glance [Part 1]

Author:

Screen Shot 2018-10-25 at 22.08.46

Several times, i interviewed programmer candidates, and mostly they failed in technical test about data structure and algorithm. Although this subject is common or maybe mandatory subject in college, especially in computer related program. but somehow, in fact, as time goes by, some of student will enough when they can create working apps or system, without or less considering algorithm and data structure in their apps or system.

When they asked why use array instead of hash, or why using insertion sort instead of bubble sort, it hard to them to justify their choice when creating the app. Why data structure is matter when creating computer program? because when we create computer program, we will realize that computer program is all about reading, manipulating and returning data. And term data here refers to all types of information. the information could be statement which is contains group of string, or value number of something. And how those data organized, known as data structure.

In programmer daily life we always interact with data structure. This interaction known as operation. There are four basic operations in data structure
1. Read, refers to looking something up in particular position within data structure.
for example, reading a value from specific index within array. we will talk about array latter.
2. Search, refers to looking for particular value within data structure.
for example search a particular value exists within array
3. Insert, refers to adding another value to data structure.
for example, adding new value to additional slot within array
4. Delete, refers to removing value from data structure.
for example, removing of the values from array.

Understanding about data structure and algorithm will help us to create effective, efficient and fast response system, by choosing most efficient algorithm, and data structure when building the system.

We will try to analyze how fast each operation when applied in data structure. Fast here is not referring to how much time needed to complete the operation, but instead in how many step it takes. why this matter, there are other factor involved when measuring the pure time. ie. when running such operation in old computer will take much time compared to time needed when same operation running in newer computer with better specification.

Measuring the speed here known as Measuring time complexity. we will explain more detail about time complexity in its own part about Big-O-Notation.

We will start with most basic data structure: Array
Array is one of the basic data structure. simply a list of data elements with index to identify the location of data inside.
['jaguar', 'tiger', 'lion', 'domba', 'embe'] is sample of an array. `jaguar` lies in index 0, `tiger` in index 1, `lion` in index 2, `domba` in index 3 and `embe` in index 4. Index in Array start from 0.
Let start analyze all four data structure operation againts array.

1. Reading
Problem : What is value at index 2 ?
Reading from array is just need one step. from our example above, if we looked up index 1, it will return `lion`. detail process about how the process happened, why computer can jump directly into particular index, it related with operation in computer memory. we won’t talk about this here, please read about how computer system work , specifically about computer’s memory. Reading from array, is very efficient operation, because only need one step. the less step needed to complete an operation, will more efficient. below is sample code in ruby to read data from array :

def read(arr_data, index)
     step = 1
     arr_data[index] ## only need 1 step to get data from specific index
     p “#{step} Steps for #{arr_data.length} Cells Array”
end

2. Searching
Problem: Is lion exist in array, in what index ?
Searching os looking to see whether a value exists within array, in which index its located at ? To search value within an array, computer will scan entire array start from index 0. and will stop only if the value found. Using array above, here are the big picture how the process of searching in array:

a. computer start from index 0, check the value. since value at index 0 is jaguar, it will move to next index.
b. computer will check index 1. since the value is tiger, it will move to next index
c. computer check index 2. finally we found lion lives at index 2. and computer will stop searching here, since
the value already found.

in this example only need 3 step, to find lion in array. how if we search embe, it will need 5 step to till `embe` found. how if we search ayam? still need 5 step before computer tell no ayam found.

In data structure, we always use worst case scenario to determine the step needed. In this example, worst case, we need 5 step to find a value from array contains 5 cells. it means : For N Cells, we need N step to do search operation. This is known Linear search.
below is sample code in ruby to Search data from array :

def searching(arr_data, value)
    step = 0
    arr_data.each_with_index do |data, index|
        step += 1
        ## lets print step information
        p “#{step} Steps for #{arr_data.length} Cells Array” if data == value
    end
end

3. Insertion
Problem: Add ayam to array above.
There will be 3 scenarios when adding value to an array:
a. Add value to the end of cells. This Insertion operation will only need 1 step. [‘jaguar’, ‘tiger’, ‘lion’, ‘domba’, ’embe’] -> add ayam -> [‘jaguar’, ‘tiger’, ‘lion’, ‘domba’, ’embe’, ‘ayam’]

b. Add value in the middle of array. this will be has different story, and need more step. why ? because we need
to shift the cells to the right, to make room form ayam. here the illustration, ie, add ayam to index 2.
1. move embe to the right
2. move domba to the right
3. move lion to the right
4. insert ayam in index 2

We can see there are 4 steps, 3 steps shifting current array data to right, and 1 step insert new data.

c. Add value into beginning of arrays. this is worst case scenario. using array above, will do 5 steps shifting data,
and 1 step to insert new data. From above scenarios, we conclude that will need 6 steps to insert new data into array consisting 5 cells. using N formula, will result : need N+1 steps to insert new data to array consisting N cells.
below is sample code in ruby to Insert data from array :

def insertion(arr_data, new_value)
    step = 0
    (arr_data.length).downto(0) do |index|
        step += 1
        arr_data[index] = arr_data[index-1]
        if index == 0
            step+=1
            arr_data[index] = new_value
            ## lets print step information and final result
            p arr_data
            p “#{step} Steps for #{arr_data.length} Cells Array”
        end
    end
end

4. Deletion
Problem: Delete lion at index 2 from array, and return the result.
Deletion is process of removing data at specific index. For deletion process itself only need one step. but if we delete data in the middle, it will left a gap in the middle. and array is not allowed to have gaps within. Let see the illustration:
a. delete lion from array at index 2. it will create gap between index 1 and 3
b. shift index 3 to left one step
c. shift index 4 to left one step

in this case, deletion need 3 steps to complete. Worst case scenario in array deletion when delete first index.
it will require N steps to complete from N cells array.
below is sample code in ruby to Delete data from array using above step :

def deletion(arr_data, deleted_at)
    step = 0
    cell = arr_data.length
    (0..arr_data.length-1).each do |index|
        step +=1
        arr_data[index] = arr_data[index+1]
        if index == arr_data.length – 1
            arr_data.delete_at(index)
           p “#{step} Steps for #{cell} Cells Array”
        end
    end
end

For exercise, please analyze all four operation againts array based Set.

Set
Set is data structure that does not allow duplicate values within.
there are several type of Sets, but in this part we will only focus on array based set.
The main difference between ordinary array and array based set is Set does not allow duplication.
it means, never allow insertion of duplicated data.

for example we have set [`jaguar`, `ayam`, `kambing`]. then we try to insert new `jaguar` into this set,
it will not allowed since `jaguar` allready exist in it. Let analyze all four operation above againts array based set.

Reading for array based set is same as reading from array, it only need one step to complete the operation.
Searching a set also will produce same result with searching an array, N step needed to search particular value from
N cells array. Deletion also same as array operation.

But insertion is different. In array we can insert data into the end of array in one single step only. but in Set, we need
to scan and check every value within set to determine that this new value does not exist in it.
inserting value into the end of set will produce N+1 complexity, N step for checking duplication, and 1 step for insertion.
this is best scenario for set.
worts case scenario when inserting value into the beginning of set, it will produce 2N+1 step. N step to check duplication,
then followed by N step to shifting current data and 1 step to insert new data.

although Set has lower performance in insertion than array, does not we should not use set. Set will has advantages over
array when requirement need no duplication. But, if we don’t need this no duplication requirement, of course we should
choose array instead.

From this example, in this context. We should analyze the needs of system will be built, and decide which
decide which data structure that has more advantages.

 

Jakarta, 25 October 2018

 

A. Ahmad Kusumah

Leave a Reply

Your email address will not be published. Required fields are marked *