algorithm, there is a need to store data. Ranging from storing a single value in a single variable, to more complex data structures. Data structures are very important and interesting part of computer science because it has various methods by which a linked type of data can be stored and retrieved in an efficient manner. The fundamental building blocks of most data structures are arrays, records, discriminated unions and references.
Data Structure data structure is a way of storing data in a computer so that it can be used efficiently. Often carefully chosen data structure will allow a more efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data structure. A well-designed data structure allows a variety of critical operations to be performed on using as little resources, both execution time and memory space, as possible.
Different kinds of data structures are suited to different kinds of applications and some are highly specialized to certain tasks. For example, B-trees are particularly well suited for implementation of databases, while routing tables rely on networks of machines to function.
In the design of many types of programs, the choice of data structures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best data structure. After the data structures are chosen, the algorithm to be used often become relatively obvious. Sometimes things work in the opposite direction; data structures are chosen because certain key tasks have algorithms that work best with particular data structures. In either case, the choice of appropriate data structures is crucial.
Since data structures are so crucial to professional programs, many of them enjoy extensive support in standard libraries of modern programming languages and environments, such as C++ Standard Template Library, the Java API, and the Microsoft.NET framework.
The fundamental building blocks of most data structures are arrays, records, discriminated unions and references. For example, nullable reference, a reference that can be null, is a combination of references and discriminated unions and the simplest linked data structure, the linked list is built from records and nullable references.
Importance of Data Structures
Data structures are very important and interesting part of computer science because it has various methods by which a linked type of data can be stored and retrieved in an efficient manner.
A good data structure also will reflect reality and that’s important for reasons:
Probably makes the program easier to think about.
Probably makes the program easier to write.
The way the data is process will often mirror the way the corresponding objects in the world interact.
Efficiency
Efficiency is usually measured by two factors: time and space. If a particular application is heavily dependent on manipulating high-level data structures, the speed at which those manipulations can be performed will be the major determinant of the speed of the entire application. Similarly, if a program uses a large number of such structures, an implementation that uses an inordinate amount of space to represent the data structure will be impractical.
Efficiency is also measured in terms of the theoretical computations, such as comparisons or data moves, the memory used, the number of messages passed, the number of disk accesses, etc.
Big O
Big O (also known as O) is a theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items.
The importance of this measure can be seen in trying to decide whether an algorithm is adequate, but may just need a better implementation, or the algorithm will always be too slow on a big enough input. For instance, quick sort that is O (n log n) on average, running on a small desktop computer can beat bubble sort, which is O (n2), running on a supercomputer if there are a lot of numbers to sort. To sort 1,000,000 numbers, the quick sort takes 20, 000,000 steps on average, while the bubble sort takes 1,000,000,000,000 steps.
Run time
Run time refers to the amount of time needed to execute an algorithm. It is also the time when a compiled program is executing, versus compile time.
Specific Data Structures
Linked List linked list is one of the fundamental data structures used in computer programming. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references (links) pointing to the next and/or previous nodes. A linked list is called a self-referential data type because it contains a pointer or link to another data of the same type. Linked lists permit insertion and removal of nodes at any point in the list in constant time, but do not allow random access. Several different types of linked list exist: singly linked lists, doubly linked lists and circularly linked lists.
Linked lists can be implemented in most languages. Languages such as C. And C++ rely on pointers and memory addressing to create linked lists.
Advantages
Inserting data is quicker and easier. All that is required is to alter 1 pointer and assign the correct pointer to new data. This is very useful for large lists.
Removing data is quicker and easier. All that is required is that the pointer pointing to the deleted data is redirected to the next data in the list. Again this is very useful for large lists.
No memory reallocation calls are needed.
No need for a large contiguous memory block.
Disadvantages
Much more intensive to setup.
Indexing and parsing is slower.
Searching (single linked list) is slow, as only a linear search in one direction is possible. For large numbers of data, this is not ideal for finding data.
Binary Tree special kind of multiple-linked list that has wide application in computer science is a data structures called Binary Tree. Unlike linked list, which is a linear data structure, binary tree is a non-linear data structure.
A binary tree is a finite set of nodes. The set might be empty (no nodes, which is called the empty tree). But if the set is not empty, it follows these rules: 1) there is one special node, called the root; 2) each node may be associated with up to two other different nodes, called its left child and its right child. If a node c is the child of another node p, then we say that “p is c’s parent”; 3) each node, except the root, has exactly one parent; the root has no parent and 4) if at start at a node and move to the node’s parent (provided there is one), then move again to that node’s parent and keep moving upward to each node’s parent, the destination will be the root.
Here are some other terms used with trees:
Parent – the parent of a node is the node linked above it.
Sibling – Two nodes are siblings if they have the same parent.
Ancestor – a node’s parent is its first ancestor. The parent of the parent is the next ancestor. The parent of the parent is the next ancestor and so forth, until the root is reached. The root is an ancestor of each node.
Descendant – a node’s children are its first descendants. The children are its next descendants. The children’s children are its next descendants. The children of the children are its next descendants and so forth.
Left and Right Sub-trees of a Node – the nodes beginning with its left child and below is its left sub-tree. The node beginning with its right child and below is its right sub-tree.
Depth of a Node – the number of steps taken to reach the root. The depth of the root itself is zero; a child of the root has depth one.
Depth of a Tree (Height of a Tree) – the maximum depth (height) of any of its leaves.
Advantages
Easy structure in which to search
Easy to insert
Easy to delete
Easy to read tree back in from disk after writing out (no recreation of links required)
The maximum tree size is limited only by the total memory available on the system.
Disadvantage
Memory allocation is not truly dynamic and it can be difficult to match the array size with the size range of the tree.
Hash Table hash table is an associative array data structure that associates keys with values. The primary operation it supports efficiently is a lookup, where it is given a key, an identifier for the information to be found such as a person’s name and asked to find the corresponding value. It works by transforming the key using a hash function into a hash, a number that the hash table uses to locate the desired value.
Hash tables are often used to implement associative arrays, sets and caches. Hash tables also store data in pseudo-random locations, so accessing the data in a sorted manner, is at best, a very time consuming operation. Other data structures such as self-balancing binary search tree generally operate slightly more slowly and are rather more complex to implement than hash tables but maintain a sorted data structures at all times.
Advantages
Hash table can provide constant-time lookup on average, regardless of the number of items in the table.
Compared to other associative array data structures, hash tables are most useful when a large number of records of data are to be stored.
The hash table size is unlimited (or limited only by available storage space). In this case no need to expand the table and recreate a hash function.
Disadvantages
It’s difficult (not efficient) to print all elements in hash table.
It’s not efficient to find minimum element or maximum element.
The hash table has fixed size. At some point all the elements in the array will be filled. The only alternative at that point is to expand the table, which means modify the hash function to accommodate the increased address space.
Hash table are more difficult and error-prone to write and use.
Hash tables in general exhibit poor locality of reference that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can be trigger microprocessor cache misses that cause long delays.
References
Augenstein Moshe J., Yedidyah, Langsam, and Aaron Tenenbaum. “Introduction to Data
Structures.” Data Structures using C. And C++. United State of America: Prentice-Hall, Inc., 1996.22-24.
Carlson, David. “Hash Tables.” Saint Vincent College. 2004. Saint Vincent College. 6 July 2005 http://cis.stvincent.edu/swd/hash/hash.html.
Main, Michael, and Walter Savitch. “Trees.” Data Structures and Others Objects using C++.
Ed. Susan Hartman. Canada: Addison Wesley Longman, 1997. 424-429.
Parlante, Nick. “Binary Trees.” cslibrary.stanford.edu. cslibrary.stanford.edu. 6 July 2005 http://cslibrary.stanford.edu/110/BinaryTrees.html.
Shahidi, Amin, and Dennis Schmidt. “Lecture 10 March 20, 2002.” www.sis.pitt.edu.6 July 2005 http://www.sis.pitt.edu/~klynch/Spring2002/lecture10/Lecture10_032002.htm#_Linked_List_Pros.
Algorithmic efficiency – Wikipedia, the free encyclopedia.” Wikipedia. 2005. Wikipedia. 6 July 2005 http://en.wikipedia.org/wiki/Algorithmic_efficiency.
Big O. notation – Wikipedia, the free encyclopedia.” Wikipedia. 2005. Wikipedia. 6 July 2005 http://en.wikipedia.org/wiki/Big_O_notation.
Programming Tutorial: Linked Lists, Trees, Hash Tables.” vergil.chemistry.gatech.edu. 2001.
The Sherill Group. 6 July 2005 http://vergil.chemistry.gatech.edu/resources/programming/c-tutorial/lists.html.
Linked list – Wikipedia, the free encyclopedia.” Wikipedia. 2005. Wikipedia. 6 July 2005 http://en.wikipedia.org/wiki/Linked_list.
Run time.” National Institute of Standard and Technology. 2005. National Institute of Standard and Technology. 6 July 2005 http://www.nist.gov/dads/HTML/runtime.html.