Algorithms and Data Structures Glossary (Starting with "H")
By
www.nist.gov,
Gaithersburg, Maryland, U.S.A.
paul
. black [at] nist . gov
http://www.nist.gov/dads
Become a member of TranslationDirectory.com at just
$12 per month (paid per year)
Advertisements:
Use the search bar to look for terms in all glossaries, dictionaries, articles and other resources simultaneously
H
- halting problem
- Hamiltonian cycle
- Hamiltonian path
- Hamming distance
- hard split, easy merge
- hashbelt
- hash function
- hash heap
- hash table
- hash table delete
- Hausdorff distance
- hB-tree
- head
- heap
- heapify
- heap property
- heapsort
- heaviest common subsequence: see longest common subsequence
- height
- height-balanced binary search tree
- height-balanced tree
- heuristic
- hidden Markov model
- highest common factor: see greatest common divisor
- histogram sort
- HMM: see hidden Markov model
- homeomorphic
- horizontal visibility map
- Horner's rule
- Horspool: see Boyer-Moore-Horspool
- hsadelta
- Huffman coding
- huge sparse array
- Hungarian algorithm: see Munkres' assignment algorithm
- hybrid algorithm
- hyperedge
- hypergraph
halting problem
(classic problem)
Definition:
Is there an algorithm to determine whether any arbitrary program halts? Turing proved the answer is, no. Since many questions can be recast to this problem, we can conclude that some programs are absolutely impossible, although heuristic or partial solutions are possible.
Generalization (I am a kind of ...)
undecidable problem.
Note:
In more detail, can there be a program that, when given any arbitrary program, finishes in finite time and correctly answers "Yes, the program you gave me halts for all inputs" or "No, the program you gave me does not halt for some inputs." This is an informal wording of what Turing proved. Please do not refer to this page if you claim to refute his proof.
Author: PEB
More information
Wikipedia's extensive entry including related problems and a link to Turing's original paper. Eric W. Weisstein's entry for Halting Problem.
Alan Turing, On computable numbers, with an application to the Entscheidungsproblem, Proceedings of the London Mathematical Society, Series 2, vol 42 (1936-37) pages 230-265.
Hamiltonian cycle
(definition)
Definition:
A path through a graph that starts and ends at the same vertex and includes every other vertex exactly once.
Also known as tour.
Generalization (I am a kind of ...)
cycle.
Specialization (... is a kind of me.)
traveling salesman.
Note:
Named for Sir William Rowan Hamilton (1805-1865) (a longer biography). A Hamiltonian cycle includes each vertex once; an Euler cycle includes each edge once.
Also known as a Hamiltonian circuit.
Author: PEB
More information
Pablo Moscato's Hamiltonian Page with papers on many aspects of Hamiltonian cycles and paths.
Hamiltonian path
(definition)
Definition:
A simple path through a graph that includes every vertex in the graph.
Generalization (I am a kind of ...)
simple path.
Note:
Named for Sir William Rowan Hamilton (1805-1865) (a longer biography).
Author: PEB
More information
Pablo Moscato's Hamiltonian Page with papers on many aspects of Hamiltonian cycles and paths.
Hamming distance
(definition)
Definition:
The number of bits which differ between two binary strings. More formally, the distance between two strings A and B is ∑ | Ai - Bi |.
Aggregate parent (I am a part of or used in ...)
brute force string search with mismatches.
Note:
After Ralf Rabaetje <rrabaetj@advea.com> 4 February 2000. The Hamming distance can be interpreted as the number of bits which need to be changed (corrupted) to turn one string into the other. Sometimes the number of characters is used instead of the number of bits. Hamming distance can be seen as Manhattan distance between bit vectors.
Author: PEB
hard split, easy merge
(algorithmic technique)
Definition:
A recursive algorithm, especially a sort algorithm, where dividing (splitting) into smaller problems is time consuming or complex and combining (merging) the solutions is quick or trivial.
Generalization (I am a kind of ...)
divide and conquer.
Aggregate parent (I am a part of or used in ...)
radix sort, quicksort, bucket sort, selection sort.
Note:
Although the notion is wide spread, I first heard this term from Doug Edwards about 1994. Called "Conquer form" of using divide and conquer in [ATCH99, page 3-3].
Author: PEB
hashbelt
(data structure)
Definition:
Use a short list or array of hash tables to implement a hash table with aging to expire items. To expire items, add a new table at the head of the list and drop the oldest table, along with its contents. To find an item, search all the tables.
Generalization (I am a kind of ...)
hash table.
Note:
"The name is a combination of 'hashmap' and 'conveyor belt.'" This data structure may be used for a least recently used (LRU) cache: move an item to the newest table when it is used. Other policies may be implemented by moving items according to other criteria.
Author: PEB
More information
William Grosso, The Hashbelt Data Structure, O'Reilly ONJava.com, 9 January 2002. Available at http://www.onjava.com/pub/a/onjava/2002/01/30/dataexp2.html (Accessed 3 Dec 2007).
hash function
(algorithm)
Definition:
A function that maps keys to integers, usually to get an even distribution on a smaller set of values.
Specialization (... is a kind of me.)
different kinds: linear hash, perfect hashing, minimal perfect hashing, order-preserving minimal perfect hashing, specific functions: Pearson's hash, multiplication method.
Aggregate parent (I am a part of or used in ...)
hash table, uniform hashing, universal hashing, Bloom filter, locality-sensitive hashing.
Note:
The range of integers is typically [0... m-1] where m is a prime number or a power of 2.
Author: PEB
hash heap
(data structure)
Definition:
An efficient implementation of a priority queue. The linear hash function monotonically maps keys to buckets, and each bucket is a heap.
Note:
This is a bucket sort where the buckets are organized as heaps. The linear hash function maps increasing keys into nondecreasing values, that is, key1 > key2 implies h(key1) is greater than or equal to h(key2). It is not clear what happens if a bucket gets full. Let R be the ratio between the key range and the range of the hash function. If R is so large there is only one bucket, we have a regular heap. If R is one, it is a direct mapped array. This data structure was proposed by Chris L. Kuszmaul <fyodor@nas.nasa.gov> in the news group comp.theory 13 January 1999.
Author: PEB
hash table
(data structure)
Definition:
A dictionary in which keys are mapped to array positions by hash functions. Having the keys of more than one item map to the same position is called a collision. There are many collision resolution schemes, but they may be divided into open addressing, chaining, and keeping one special overflow area. Perfect hashing avoids collisions, but may be time-consuming to create.
Also known as scatter storage.
Specialization (... is a kind of me.)
perfect hashing, dynamic hashing, 2-left hashing, cuckoo hashing, 2-choice hashing, hashbelt.
Aggregate parent (I am a part of or used in ...)
dictionary.
Aggregate child (... is a part of or used in me.)
load factor, hash table delete, collision resolution: coalesced chaining, linear probing, double hashing, quadratic probing, uniform hashing, simple uniform hashing, separate chaining, direct chaining, clustering.
Note:
Complexity depends on the hash function and collision resolution scheme, but may be constant (Θ(1)) if the table is big enough or grows. Some open addressing schemes suffer from clustering more than others. The table may be an array of buckets, to handle some numbers of collisions easily, but some provision must still be made for bucket overflow.
Author: PEB
More information
explanation and example of hashing and various collision resolution techniques.
Historical Note
"The idea of hashing appears to have been originated by H. P. Luhn, who wrote an internal IBM memorandum in January 1953" [Knuth98, 3:547, Sect. 6.4]. He continues with more than a page of history.
hash table delete
(algorithm)
Definition:
If the hash table uses chaining, use the chaining data structure delete. If it uses open addressing, mark the item "deleted" for future reuse.
Author: PEB
Hausdorff distance
(definition)
Definition:
A measure of the resemblance of two (fixed) sets of geometric points P and Q, defined as H(P,Q)=max{maxa∈ P minb∈ Q d(a,b), maxa∈ Q minb∈ P d(a,b)} where d(·,·) is the distance metric, usually the Euclidean distance.
Note:
Adapted from [AS98, page 437].
Author: PEB
hB-tree
(data structure)
Definition:
A point access method which uses k-d trees to organize the space, but partitions as excluded intervals, like the BANG file. Searching is like in a k-d-B-tree.
Note:
After [GG98].
Author: PEB
head
(definition)
Definition:
The first item of a list.
Author: PEB
heap
(data structure)
Definition:
A complete tree where every node has a key more extreme (greater or less) than or equal to the key of its parent. Usually understood to be a binary heap.
Generalization (I am a kind of ...)
complete tree.
Specialization (... is a kind of me.)
binary heap, k-ary heap, binomial heap, Fibonacci heap.
Aggregate parent (I am a part of or used in ...)
heapsort, priority queue.
Aggregate child (... is a part of or used in me.)
heap property.
Note:
Speaking about operating systems, "heap" refers to memory from which chunks can be allocated.
Author: CLK
More information
J. W. J. Williams, Algorithm 232 Heapsort, CACM, 7(6):378-348, June 1964.
heapify
(algorithm)
Definition:
Rearrange a heap to maintain the heap property, that is, the key of the root node is more extreme (greater or less) than or equal to the keys of its children. If the root node's key is not more extreme, swap it with the most extreme child key, then recursively heapify that child's subtree. The child subtrees must be heaps to start.
Note:
For an array implementation, heapify takes O(log2 n) or O(h) time under the comparison model, where n is the number of nodes and h is the height.
Author: PEB
heap property
(definition)
Definition:
Each node in a tree has a key which is more extreme (greater or less) than or equal to the key of its parent.
Specialization (... is a kind of me.)
max-heap property, min-heap property.
Aggregate parent (I am a part of or used in ...)
binary heap, heap, binary priority queue.
Note:
The root node has the largest or smallest key. After LK. A complete tree with the heap property is a heap.
Author: PEB
heapsort
(algorithm)
Definition:
A sort algorithm that builds a heap, then repeatedly extracts the maximum item. Run time is O(n log n).
Generalization (I am a kind of ...)
in-place sort.
Specialization (... is a kind of me.)
adaptive heap sort, smoothsort.
Aggregate child (... is a part of or used in me.)
build-heap, heap.
Note:
Heapsort can be seen as a variant of selection sort in which sorted items are arranged (in a heap) to quickly find the next item. Some times called "tournament sort" by analogy between a heap, with its heap property, and a tree of matches in a sports tournament.
Author: PEB
More information
demonstration. Comparison of quicksort, heapsort, and merge sort on modern processors.
Robert W. Floyd, Algorithm 245 Treesort 3, CACM, 7(12):701, December 1964. J. W. J. Williams, Algorithm 232 Heapsort, CACM, 7(6):378-348, June 1964. Although Williams clearly stated the idea of heapsort, Floyd gave a complete, efficient implementation nearly identical to what we use today.
height
(definition)
Definition:
The maximum distance of any leaf from the root of a tree. If a tree has only one node (the root), the height is zero.
Note:
The height of the figure at the definition of tree is two.
The height of a tree is also known as the order.
Author: PEB
height-balanced binary search tree
(data structure)
Definition:
A height-balanced tree which is also a binary search tree. It supports membership, insert, and delete operations in time logarithmic in the number of nodes in the tree.
Generalization (I am a kind of ...)
height-balanced tree, binary search tree, balanced binary tree.
Specialization (... is a kind of me.)
AVL tree, red-black tree.
Author: PEB
height-balanced tree
(data structure)
Definition:
A tree whose subtrees differ in height by no more than one and the subtrees are height-balanced, too. An empty tree is height-balanced.
Generalization (I am a kind of ...)
balanced tree.
Note:
After [HS83, page 442].
Author: PEB
More information
[Knuth98, 3:459, Sect. 6.2.3]
heuristic
(algorithmic technique)
Definition:
An algorithm that usually, but not always, works or that gives nearly the right answer.
Specialization (... is a kind of me.)
move-to-front heuristic, Christofides algorithm, Monte Carlo algorithm.
Note:
Some problems, such as the traveling salesman problem, may take far too long to compute an optimal solution. But some have good heuristics that are fast and find a solution no more than a few percent worse than the optimal.
Author: PEB
hidden Markov model
(data structure)
Definition:
A variant of a finite state machine having a set of states, Q, an output alphabet, O, transition probabilities, A, output probabilities, B, and initial state probabilities, Π. The current state is not observable. Instead, each state produces an output with a certain probability (B). Usually the states, Q, and outputs, O, are understood, so an HMM is said to be a triple, (A, B, Π).
Formal Definition: After Michael Cohen's lectures for CN760. - A = {aij = P(qj at t+1 | qi at t)}, where P(a | b) is the conditional probability of a given b, t ≥ 1 is time, and qi ∈ Q.
Informally, A is the probability that the next state is qj given that the current state is qi. - B = {bik = P(ok | qi)}, where ok ∈ O.
Informally, B is the probability that the output is ok given that the current state is qi. - Π = {pi = P(qi at t=1)}.
Also known as HMM.
Generalization (I am a kind of ...)
finite state machine.
Aggregate parent (I am a part of or used in ...)
Baum Welch algorithm, Viterbi algorithm.
Note:
Computing a model given sets of sequences of observed outputs is very difficult, since the states are not directly observable and transitions are probabilistic. One method is the Baum Welch algorithm. Although the states cannot, by definition, be directly observed, the most likely sequence of sets for a given sequence of observed outputs can be computed in O(nt), where n is the number of states and t is the length of the sequence. One method is the Viterbi algorithm.
Thanks to Arvind <uk_arvind@mail.utexas.edu> May 2002. Named after Andrei Andreyevich Markov (1856 - 1922), who studied poetry and other texts as stochastic sequences of characters.
Author: PEB
More information
L. E. Baum, An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities, 3:1-8, 1972.
histogram sort
(algorithm)
Definition:
An efficient 3-pass refinement of a bucket sort algorithm. The first pass counts the number of items for each bucket in an auxiliary array, and then makes a running total so each auxiliary entry is the number of preceding items. The second pass puts each item in its proper bucket according to the auxiliary entry for the key of that item. The last pass sorts each bucket.
Also known as interpolation sort.
Generalization (I am a kind of ...)
bucket sort.
Specialization (... is a kind of me.)
counting sort.
Note:
A bucket sort uses fixed-size buckets. A histogram sort sets up buckets of exactly the right size in a first pass. A counting sort is a histogram sort with one bucket per possible key value. The following note is due to Richard Harter, cri@tiac.net, http://www.tiac.net/users/cri/, 8 January 2001, and is used by permission. The run time is effectively O(n log log n). Let S be the data set to be sorted, where n=|S|. R is an approximate rank function to sort the data into n bins. R has the following properties.
- R is an integer valued function into [0, n-1].
- 0 ≤ R(x) ≤ n-1 for x in S.
- For some x,y in S, R(x)=0 and R(y)=n-1.
- x < y implies R(x) ≤ R(y) for x,y in S.
Each bin then has, on average, 1 entry. Under some rather broad assumptions the number of entries in a bin will be Poisson distributed whence the observation that the sort is O(n log log n). Let T be the final array for the sorted data. Allocate an auxiliary integer array H indexed 0 ... n-1. We make one pass through the data to count the number of items in each bin, recording the counts in H. The array H is then converted into a cumulative array so each entry in H specifies the beginning bin position of the bin contents in T. We then make a second pass through the data. We copy each item x in S from S to T at H(R(x)), then increment H(R(x)) so the next item in the bin goes in the next location in T. (The bin number R(x) could be saved in still another auxiliary array to trade off memory for computation.) For numeric data, there is a simple R function that works very well: Let min, max be the minimum and maximum of S. Then R(x) = n*(x - min)/(max-min). This uses quite a bit of extra memory. For large data sets, there could be slow downs because of page faults. For large n it is more efficient to bound the number of bins.
Author: PEB
homeomorphic
(definition)
Definition:
Two graphs are homeomorphic if they can be made isomorphic by inserting new vertices of degree 2 into edges.
Note:
This corresponds to the topological notion since adding or removing degree 2 vertices doesn't change the graph's topology. This is similar to, but not the same as, homomorphic.
Author: PEB
More information
Eric Weisstein's MathWorld entries for homomorphism and homeomorphism.
horizontal visibility map
(definition)
Definition:
A partition of the plane into regions by drawing a horizontal straight line through each vertex p of a planar straight-line graph until it intersects an edge e of the graph or extends to infinity. The edge e is said to be horizontally visible from p.
Note:
From Algorithms and Theory of Computation Handbook, page 19-27, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.
Author: CRC-A
Horner's rule
(algorithm)
Definition:
A polynomial A(x) = a0 + a1x + a2x² + a3x³ + ... may be written as A(x) = a0 + x(a1 + x(a2 + x(a3 + ...))).
Note:
A polynomial may be evaluated at a point x', that is A(x') computed, in Θ(n) time using Horner's rule. That is, repeated multiplications and additions, rather than the naive methods of raising x to powers, multiplying by the coefficient, and accumulating.
Author: PEB
hsadelta
(algorithm)
Definition:
An algorithm to produce a sequence of insert and copy commands (an edit script) which creates a new version file from a reference file. To begin, hash every block of the reference file and store every hash in a hash value array. Build a suffix array and three other data structures for quick access. Beginning at the first location in the version file, hash a block and look for the longest match in the reference file. Upon a match, encode an insert back to the previous match and a copy of the match. If no match, look at the next location. At the end, encode an insert for remaining unmatched characters.
Aggregate parent (I am a part of or used in ...)
longest common subsequence.
Aggregate child (... is a part of or used in me.)
Bloom filter, hash table, binary search.
Note:
There are many differential compression algorithms, such as vcdiff, xdelta, and zdelta. See Agarwal et. al. I happened to find this one, and it uses neat data structures. The three other data structures for quick access to the suffix array are: the quick index array, a Bloom filter using one hash function, the block hash table, and the pointer array, an array of pointers into the block hash table. The quick index array allows most (97%) version file block hashes to be rejected as having no match. The block hash table has pointers into the suffix array. The pointer array can be viewed as a table of the hash function into the block hash table, which uses open addressing in case of collisions.
When a match is found, the suffix array is searched, via binary search in the block hash table, to find the longest matching sequence of blocks. Then the match is extended character by character as far as possible.
Author: PEB
More information
Ramesh C. Agarwal, Karan Gupta, Shaili Jain, and Suchitra Amalapurapu, An approximation to the greedy algorithm for differential compression, IBM Journal of Research & Development, 50(1):149-166, January 2006.
Huffman coding
(algorithm)
Definition:
A minimal variable-length character coding based on the frequency of each character. First, each character becomes a trivial binary tree, with the character as the only node. The character's frequency is the tree's frequency. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. Repeat until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits.
Also known as static Huffman coding.
Generalization (I am a kind of ...)
greedy algorithm.
Specialization (... is a kind of me.)
adaptive Huffman coding, k-ary Huffman coding.
Aggregate child (... is a part of or used in me.)
coding tree, full binary tree, priority queue.
Note:
The worst case for Huffman coding (or, equivalently, the longest Huffman coding for a set of characters) is when the distribution of frequencies follows the Fibonacci numbers. For this and other relations see Alex Vinokur's note on Fibonacci numbers, Lucas numbers and Huffman codes.
Joining trees by frequency is the same as merging sequences by length in optimal merge. See the example there. Since a node with only one child is not optimal, any Huffman coding corresponds to a full binary tree.
Huffman coding is one of many lossless compression algorithms. This algorithm produces a prefix code.
Shannon-Fano is a minimal prefix code. Huffman is optimal for character coding (one character-one code word) and simple to program. Arithmetic coding is even more compact, since it can allocate fractional bits, but is more complicated and patents cover some uses.
Author: PEB
More information
A survey on data compression, John Morris' explanation, example, and analysis. Wikipedia's encyclopedic article on Huffman coding.
David A. Huffman, A Method for The Construction of Minimum Redundancy Codes, Proceedings of IRE, 40(9):1098-1101, September 1952.
huge sparse array
(data structure)
Definition:
Let N be the number of items to store and R be the size of the range of key values; R >> N. Allocate, but don't initialize, two arrays: an item array I, where |I|≥N, and a location array L, where |L|=R. Initialize a variable, next, the number of items, to zero (with 0-based indexing). To insert an item, put it in the next place in the item array and save where to find it in the location array.
I[next] = item; L[item.key] = next; next++; To look up an item by key, get the index from the location array. If the index is invalid or refers to the wrong item, the item is not found. index = L[key]; if (index < 0 OR index >= next) return NOTFOUND; if (I[index].key != key) return NOTFOUND; return I[index]; Inserting N items takes Θ(N) total time, assuming allocation takes constant time. Retrieving an item by key (or responding "not found") takes constant time. Listing all items takes Θ(N) time using I.
Generalization (I am a kind of ...)
array.
Note:
Keys must be unique. Duplicate keys could be handled with collision resolution schemes. This data structure is usually impractical since the range is enormous.
This effectively allocates a huge array without explicitly initializing it, at the cost of Θ(N) space and a little time for each access. This is what hash tables want to be: constant time for insertion and look up.
Author: PEB
More information
Historical Note
Motivating problem first posed as exercise 2.12 in Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms, page 71, Addison-Wesley, 1974.
hybrid algorithm
(definition)
Definition:
Any algorithm composed of simpler algorithms.
Author: PEB
hyperedge
(definition)
Definition:
A connection between any number of vertices of a hypergraph.
Formal Definition: A hyperedge is a set of vertices of a hypergraph.
Aggregate parent (I am a part of or used in ...)
hypergraph.
Author: PEB
hypergraph
(data structure)
Definition:
A graph whose hyperedges connect two or more vertices.
Formal Definition: A hypergraph G can be defined as a pair (V, E), where V is a set of vertices, and E is a set of hyperedges between the vertices. Each hyperedge is a set of vertices: E ⊆ {{u, v, ...} ∈ 2V}. (Hyperedges are undirected.)
Generalization (I am a kind of ...)
undirected graph.
Aggregate child (... is a part of or used in me.)
hyperedge, vertex.
Note:
Consider "family," a relation connecting two or more people. If each person is a vertex, a family hyperedge connects the father, the mother, and all of their children. So G = (people, family) is a hypergraph. Contrast this with the binary relations "married to," which connects a man and a woman, or "child of," which is directed from a child to his or her father or mother.
Author: PEB
Published - April 2009
Find free glossaries at TranslationDirectory.com
Find free dictionaries at TranslationDirectory.com
Subscribe to free TranslationDirectory.com newsletter
Need more translation jobs from translation agencies? Click here!
Translation agencies are welcome to register here - Free!
Freelance translators are welcome to register here - Free!
Submit your glossary or dictionary for publishing at TranslationDirectory.com
|