2017년 3월 31일 금요일

Choice algorithm

Choice algorithm

Choice algorithm (British: selection algorithm) is algorithm in search of the number that is small from progression in k joint (or is big in k joint). The algorithm in search of the minimum, the maximum, the median of the choice algorithm it may be said that is special. I call these "an order statistic". A thing of the linear time is known to the relatively simple minimum, maximum, k joint for the algorithm for the small value on the average. k joint can look for a small value and at a time plural order statistics in at the worst linear time. Choice is a partial problem of a more complicated problem such as the paraissue of search and the shortest course problem recently.

Table of contents

Choice with the sort

The algorithm used with simplicity well is a method to pull out the element of the k joint after taking a sort in progression. This is an example of the reduction from a certain problem to a different problem. When I want to make many choice from one progression, this is convenient, and the choice from the progression that has been sorted becomes very easy if I sort only first once. When several lines of contents are largely changed at every case and choice to make only one time of choice, this method costs dearly and generally it is lowest and needs time of O(n log n). n is the length of the list here.

Linear shape biggest smallest algorithmic

The algorithm for the minimum, the maximum to be at the worst linear time is self-evident. Using two variables, I store away the index in the list of previous maximum / minimums in one and store value itself to another. I read a list sequentially and update a variable if I find the value like it.

  function minimum(a[1..n])      minIndex: = 1      minValue: = a[1]      for i from 2 to n          if a[i] <minValue              minIndex: = i              minValue: = a[i]      return minValue   function maximum(a[1..n])      maxIndex: = 1      maxValue: = a[1]      for i from 2 to n          if a[i] > maxValue              maxIndex: = i              maxValue: = a[i]      return maxValue 

When this algorithm has element x which is not included in A about limited subset A (e.g., the subsets such as an integer, a real number, the single language of the dictionary) of the totally ordered set,  But, it is based on a theorem to hold good. Attention is necessary for the possibility that the minimum and the maximum plurally exist then. Because the comparison of the para-cord mentioned above is close, I find out the minimum of the smallest index in this algorithm. I will find out the minimum of the maximum index if I use a comparison (≤ and ≥) that is not close.

When I want to find the maximum and the minimum at the same time, the comparison of the pair unit is considered as some improvement methods. In other words, I compare the element of an odd number joint and the even number joint and compare the maximum and the small with the minimum in big one. Different technique includes a divide and conquer method. In other words, I divide a list into half and demand each maximum, minimum in the first half and the latter half and demand the overall maximum, minimum from those values.

Non-linear choice algorithm

I just use the maximum / minimum algorithm, and k joint can easily make the algorithm for the small value (the value that or is big in k joint), but the efficiency is insufficient. As for the necessary time, it is efficient so as to be small if k is small in O(kn). In this algorithm, I merely find the value like it most and move it at the top on the list and repeat it until I reach the necessary k joint. It is the same as a just incomplete choice sort. I show below algorithm in search of a small value.

  function select(a[1..n], k)      for i from 1 to k          minIndex = i          minValue = a[i]          for j from i+1 to n              if a[j] <minValue                  minIndex = j                  minValue = a[j]          swap a[i] and a[minIndex]      return a[k] 

This method has the following advantages.

  • It is in condition to know a small price in j joint, and it is in O(k) if it is kj which is O(j + (k-j)2) at time to suffer though I demand a small value from k joint.
  • I can use the linear list data structure. On the other hand, I need random access by the partition-based technique to mention later.

Partition-based general-purpose choice algorithm

There is at least one in the algorithm of the at the worst linear time to choose a big value in k joint. Manuel Blum, Robert Floyd, Pratt, Robert terJean announced this in article Time bounds for selection of 1973. The algorithm matches technique used in quick sort with an original invention part.

In the quick sort, there is the subprocedure called the partition dividing a list in the list of the number that is smaller than a certain value and the list of the number that is bigger than it, and this is linear shape time. I show below a para-cord performing a partition with a value called a[pivotIndex].

  function partition(a, left, right, pivotIndex)      pivotValue: = a[pivotIndex]      swap a[pivotIndex] and a[right]  I put a // pivoting level last      storeIndex: = left      for i from left to right-1          if a[i]≤pivotValue              swap a[storeIndex] and a[i]              storeIndex: = storeIndex +1      swap a[right] and a[storeIndex]  return storeIndex which establishes a // pivoting level in the final place 

In the quick sort, I sort the list which I divided into two recursively each and take time of Ω(n log n) at the best. However, I know it whether there is a value necessary for which partition by the choice. I know it where the price of the k joint is in if I compare k with the index of the pivoting level. Therefore, you should make processing only on necessary one recursively.

  function select(a, k, left, right)      select a pivot value a[pivotIndex]      pivotNewIndex: = partition(a, left, right, pivotIndex)      if k = pivotNewIndex          return a[k]      else if k <pivotNewIndex          return select(a, k, left, pivotNewIndex-1)      else          return select(a, k-pivotNewIndex, pivotNewIndex+1, right) 

I want to attract attention of resemblance with the quick sort. The above-mentioned minimum-based choice algorithm was an incomplete choice sort, but this is incomplete quick sort and handles only a part of O(log n) among partitions of O(n). This simple procedure can hope that it is performance of the linear time and shows good performance like real quick sort. This is in-place algorithm and uses only quantity of constant memory. To that end, you should make end recurrence a loop as follows.

  function select(a, k, left, right)      loop          select a pivot value a[pivotIndex]          pivotNewIndex: = partition(a, left, right, pivotIndex)          if k = pivotNewIndex              return a[k]          else if k <pivotNewIndex              Right: = pivotNewIndex-1          else              left: = pivotNewIndex+1  

Like quick sort, as for this algorithm, performance depends on the choice of the pivoting level. When the value that is not good is chosen for a pivot price throughout, the performance decreases in the above-mentioned minimum-based choice and same class.

Quick select using "the median of the median"

I can do quick select at at the worst linear time throughout if I can find a good pivoting level. Therefore at first I divide original sequence into the microsequence of the element by five. I look for the median every microsequence next and, for the sequence that extracted only each median, apply choice algorithm recursively more. In that way it is most suitable for a pivot price and can reduce the median of the found median by one repetition to a ratio (30% is the worst at the best 70%) that is more constant than a number of element of the original sequence.

But, as for the worst time, it becomes surely linear by this method, but a simple method such as really randomly selecting a pivoting level is superior in average time.

Choice as the graded sort

The advantage of the method by the incomplete choice sort is to reduce the cost that the later choice costs as having mentioned above. The number of times may not be fixed beforehand how long you choose it among a certain progression. In addition, it may not be fixed beforehand whether you choose small whether you choose big one either. In this case I transform algorithm and choose it while performing a partial sort and can comprise it in the future choice at the same time.

The minimum-based algorithm performs the sort that the partition-based algorithm is partial, too. When I look for an index of the minimum base which it is to sort it, and is particularly small to an index appointed in the algorithm, I come into force. The partition-based algorithm does not behave in a similar way automatically, but can promote efficiency of processing using it if I memorize the choice of the pivoting level when I make the later choice (particularly first pivoting level). The list is sorted if I memorize all the pivoting levels so as to perform it if I choose it. I can reuse the list of the pivoting level in quick sort later.

Use of data structure for choice in the associate linear shape time

Because when there is the list of data that is not stood in line, it is necessary to check all elements to find the minimum (otherwise may miss the minimum); is (Ω(n)) at linear shape time It is said that を is necessary. If all data always stand in line when I make a list, the choice of the value that is big in k joint becomes easy. However, therefore I need linear time for the insertion of data to the middle on the list (similar to the merge on two lists).

It stores the strategy for the order statistic in data structure suitable for choosing data in associate linear shape time. Such a data structure includes a hierarchy structure-based thing and frequency distribution chart.

When only the minimum (or the maximum) is necessary, the cue with the priority is suitable. This can look for the minimum (or the maximum) in a given period of time, and the operation such as other insertion shows the performance that is better than O(log n) or it, too. More generally, the choice of the value that the insertion of the element has a big in k joint is possible in time of O(log n) if I use the search tree for equilibrium two minutes. It stores the value that I counted how many nodes the subordinates have in each node and uses it though it is decided which pass you trace. When I added a node (the data which, in other words, are new), I finish this information in time of O(log n) because you should revise only connected higher node group. In addition, the turn of the tree influences only node group about it.

A different simple strategy includes the thing based on the concept such as the hush table. When I understand the range of the value that data can take beforehand, I divide it into the part of the h unit and set the storage place of the h unit. When I insert an element newly, I put it in the storage place corresponding to the range where the value is within. You look for the storage place that is not empty from small from big one to look for the maximum and the minimum and should look for the maximum and the minimum in the storage place. 一般に k 番目の要素を探すには、各格納場所に入れた要素数をカウントしておき、端からその値を加算していって k 番目を含む格納場所を探す。 そして、その格納場所内の要素群から必要な要素を線形時間アルゴリズムで探し出せばよい。

h のサイズをおよそ sqrt(n) としたとき、データの分布がほぼ一様であれば、この方式による選択は O(sqrt(n)) の時間になる。しかし、データが狭い範囲に固まって出現すると(クラスタリング)、この手法では1つの格納場所に多大な要素が格納されることになってしまう(クラスタリングはハッシュ関数をうまく設定することで排除できるが、k 番目に大きい値をばらばらにハッシュされた中から探すのは現実的でない)。さらにハッシュテーブルと同様で、この方式では要素数 n が増大して h2 より大きくなった場合に、効率を改善するために再構成(つまり h を大きくする)する必要がある。この方式はデータ数が分かっている場合の順序統計量を探すのに適している(例えば学生の成績の統計処理など)。各格納場所の値の範囲を 1 としてそれぞれに要素数をカウントするようにするのが最も優れている。そのようなハッシュテーブルは要約統計量でデータを分類するための度数分布表に似ている。

k個の最小・最大要素の選択

別の基本的な選択問題として、k 個の最大要素(または最小要素)を選択する問題がある。例えば、売り上げトップ100社のようなリストを得たい場合に相当する。単純だが効率的でない手法はいくつか存在する。上述の選択アルゴリズムで1個ずつ要素を選択していけば O(kn) の時間で k 個の要素を選択できる。これは、k 番目までの選択ソートを実施するのと同じである。もし log nk よりずっと小さいなら、リスト全体をソートしてしまう方が効率的である。

別の単純な方法は、ヒープ平衡2分探索木のような順序を維持できるデータ構造にデータを格納することである。データ構造に k 個以上の要素があるなら、いらない要素を削除する(小さい方の要素群が必要なら最大の要素を削除する)。これには O(log k) の時間がかかる。要素の挿入にも同じ時間がかかり、全体として O(nlog k) の時間を要する。

これらよりも効率的な手法として、マージソートクイックソートに基づいた部分ソートアルゴリズムがある。クイックソート方式の方が簡単で、選択したい k 個の要素群(の一部)を含まないパーティションをソートしない。ピボット値が k 番目かそれ以降なら、左側のパーティションだけに再帰を施せばよい。

  function quicksortFirstK(a, left, right, k)      if right > left          select a pivot value a[pivotIndex]          pivotNewIndex: = partition(a, left, right, pivotIndex)          quicksortFirstK(a, left, pivotNewIndex-1, k)          if pivotNewIndex < k              quicksortFirstK(a, pivotNewIndex+1, right, k) 

このアルゴリズムにかかる時間は O(n + klogk) であり、実際非常に効率が良い。特に、n に比較して k が十分小さいなら、選択ソートを使うようにすれば効率が向上する。

選択する k 個の要素がソートされている必要がないなら、もっと効率化することができる。その場合、k 番目の要素を含むパーティションだけに再帰を施せばよく、その前後はソートする必要がない。

  function findFirstK(a, left, right, k)      if right > left          select a pivot value a[pivotIndex]          pivotNewIndex: = partition(a, left, right, pivotIndex)          if pivotNewIndex > k  // new condition              findFirstK(a, left, pivotNewIndex-1, k)          if pivotNewIndex < k              findFirstK(a, pivotNewIndex+1, right, k) 

このアルゴリズムに掛かる時間は O(n) であり、アルゴリズムとしては最善の部類になる。

別の方法はトーナメントアルゴリズムである。まず、隣接するペアで試合(比較)を行い、勝者を次の試合に進めていき、最終的に優勝を決定する。このときトーナメント木を生成する。この時点で 2位の要素は優勝者に直接負けた要素であるはずなので、木構造を辿っていけば O(log n) の時間で探すことができる。このとき新たなトーナメント木を生成する。3位の要素は 2位の要素に直接負けた要素のはずなので、2つのトーナメント木からそれを探し出す。このようにして k 個の要素を探し出すまで処理を行う。このアルゴリズムは O(n + k log n) の時間がかかる。

k = 2 の場合、O(n + log n) の時間で選択ができる。

下限

ドナルド・クヌースは、The Art of Computer Programming の中で、k 番目に小さい要素を n 個の要素から(比較だけで)選択するのに必要な比較回数の下限を論じている。最大値または最小値を求めるのに必要な比較回数の下限は n − 1 である。これを求めるために各試合で1回の比較を行うトーナメントを想定する。トーナメント優勝者以外の選手は優勝者が決定するまでに必ず1回負けているので、比較回数の下限が n − 1 となるのである。

1番目以外では話はやや複雑になる。k 番目に小さい値を求めるには少なくとも以下の回数の比較が必要である。

 

この下限は k=2 のとき成り立つが、さらに大きな k ではもっと複雑な下限が存在する。

言語サポート

リストの最大値と最小値を求める機能を持つ言語は多々あるが、汎用的な選択を組み込み機能で持っている言語はほとんどない。C++ は例外的に nth_element メソッドのテンプレートを持っており、線形時間での選択が期待できることを保証している。その実装がこれまで説明したアルゴリズムを使用している可能性は高いが、規定はされていない。(ISO/IEC 14882:2003(E) と 14882:1998(E) のセクション25.3.2参照。 また、SGI STL の nth_elementを参照)

C++ では、partial_sort アルゴリズムも提供されており、k 個の最小要素をソートした状態で選択する処理を O(nlog k) の時間で行う。k 個の最大要素を選択するアルゴリズムは提供されていないが、順序判定を逆転させれば簡単に実現できる。

PerlにはCPANより Sort::Key::Top というモジュールが出ていて、n 個の要素を選択する関数群が提供されている。

ソートアルゴリズムの言語サポートの方が多いため、実際には単純にソートを行ってから選択する方法が(性能的には不利であるが)多く使われている。

関連項目

参考文献

  • M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, "Time bounds for selection," J. Cornput. System Sci. 7 (1973) 448-461.
  • Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 5.3.3: Minimum-Comparison Selection, pp.207–219.
  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 9: Medians and Order Statistics, pp.183–196. Section 14.1: Dynamic order statistics, pp.302–308.

This article is taken from the Japanese Wikipedia Choice algorithm

This article is distributed by cc-by-sa or GFDL license in accordance with the provisions of Wikipedia.

Wikipedia and Tranpedia does not guarantee the accuracy of this document. See our disclaimer for more information.

In addition, Tranpedia is simply not responsible for any show is only by translating the writings of foreign licenses that are compatible with CC-BY-SA license information.

0 개의 댓글:

댓글 쓰기