1.0 Descriptions of Bucket Sort

      Unlike the cocktail sort, the bucket sort algorithm sort utilizes a different approach and sometimes multiple ones to solve its problem. Using this algorithm by itself would represent its divide and conquer approach to sorting a sequence. However, the algorithm is more diverse, representing a transform and conquer approach, as after division into buckets, the algorithm may continue its bucket division or utilize another algorithm to sort the bucket before merging all buckets to garner the initial sequence. The main characteristic of the algorithm is its initial division of the problem into buckets in order to solve the sub problems before merging into a solution. How these sub problems are solved is dependent on the particular implementation.

1.1 How bucket Sort Works

Typically the bucket sort algorithm poses many issues where efficiency is concerned. Depending on its implementation it may increase to polynomial time or utilize extensive amounts of memory. A common issue includes the range of its data set and determining suitable bucket sizes before applying its particular algorithm. In addition, using the correct amount of buckets is also an issue as larger buckets will have greater runtime and smaller buckets will also multiply the workload.

A typical implementation of the bucket sort algorithm would include first splitting the data set into several buckets. This would be followed by sorting each bucket utilizing the bucket sort algorithm or another and finally merging these buckets to obtain a sorted sequence.

The implementation of the bucket sort utilized in this project aims to combat the issues mentioned above dynamically by choosing suitable bucket sizes and capitalizing on the best case scenario of the insertion sort with the aid of several data structures. This implementation, unlike initial research has proved competitive to the cocktail sort algorithm it is being compared with.

 

1.1 Difference with Bubble sort and Cocktail Sort

     Cocktail sort is a slight variation of bubble sort. It differs in that instead of repeatedly passing through the list from bottom to top, it passes alternately from bottom to top and then from top to bottom. It can achieve slightly better performance than a standard bubble sort. The reason for this is that bubble sort only passes through the list in one direction and therefore can only move items backward one step each iteration.

1.2 Other names of Bucket Sort

  According to Black (2009), the bucket sort is also called the bin sort, generalized as a distribution sort and similar to the range, radix sorts and hash heap.

1.5 Pseudo code of Bucket Sort

  Asymptotic Notation (Worst case)
n=number of values in array
b=number of buckets
s=bucket size
 
BucketSort ( numbers : array of values, count : size of numbers)  
   min <- 100000000000000000 O(1)
   max <- 0 O(1)
   count <- 0 O(1)
   for i <- 0 to numbers.length O(n)
          if min > numbers[i] then O(1)
               min <- numbers[i] O(1)
          if  max < numbers[i]  then O(1)
               max <- numbers[i] O(1)
    end for  
    bucketsize <-  (max-min)*2 / count O(1)
    noOfBuckets <- max / bucketsize O(1)
    Buckets : array of Buckets[noOfBuckets+1] O(1)

    for i <- 0 to numbers.length
O(n)
        bucketIndex <- floor ( numbers[i] / bucketsize) O(1)
        //use insertion sort to insert number in bucket  
        for b <- 0 in Buckets[bucketIndex].numbers.length O(b)
                      if  numbers[i] < Buckets[index].numbers[b] O(1)
                        insert numbers[i] before Buckets[index].numbers[b] O(s)
                      end if  
       end for  
     end for   
     sequenceIndex <- 0 O(1)
     for i <-0 to noOfBuckets O(b)
            for b <- 0 to Buckets[i].numbers.length O(s)
                   numbers[sequenceIndex]= Buckets[i].numbers[b] O(1)
                   sequenceIndex <- sequenceIndex + 1 O(1)
   
    return numbers  

 Therefore, worst case for the above algorithm is

O ( max⁡( n ×b ×s,b ×s,1 ))=O(n ×b ×s)

1.7 C++ Code of Bucket Sort

#ifndef BNODE_H                            

#define BNODE_H

//Linked List Node

class BNode{

  public:

        int value;

        BNode *next;

        BNode(int v, BNode*n){

               value=v;

               next=n;

        }

        BNode(BNode &n){

               value = n.value;

               next = n.next;

        }

        ~BNode(){}

};

#endif

#ifndef BDOMAIN

#define BDOMAIN

#include <iostream>

//represesnts one bucket

class BDomain{

    private:

              int domainStart; //start of bucket or min value in range

              int size;        // size of bucket

              BNode* values;   //linked list of bucket values

              int count;       //count of values

     public:

               BDomain()

               {

                      domainStart=size=count=0;

                      values=NULL;

               }

               BDomain(int DomainStart, int Size,BNode* Values, int Count){

                      domainStart = DomainStart;

                      size=Size;

                      values = Values;

                      count=Count;

               }

               BDomain(BDomain &b){

                      domainStart=b.domainStart;

                      size=b.size;

                      values=b.values;

                      count=b.count;

               }

               ~BDomain(){}

               void init(int DomainStart, int Size,BNode* Values, int Count){

                      domainStart = DomainStart;

                      size=Size;

                      values = Values;

                      count=Count;

               }

               void init(int DomainStart, int Size){

                      domainStart = DomainStart;

                      size=Size;

                      values=NULL;

                      count=0;

               }

               int getDomainStart(){ return domainStart; }

               int getCount(){return count;}

              int getSize(){return size;}

               //return int array instead of linked list of nodes

               int* getValues(){

                      int* vals=new int[count];

                      int iterator=0;

                      BNode *temp = values;

                      while(temp !=NULL){

                              vals[iterator++]=temp->value;

                              temp=temp->next;

                      }                        

                      return vals;

               }

               void setDomainStart(int s){

                      domainStart=s;

               }

               void setSize(int s){

                      size=s;

               }

               //add each value to the bucket in order

               void addValue(int s){

                      BNode *head = values;

                      if(head ==NULL){

                            head= new BNode(s,NULL);                       

                      }else{

                            BNode*temp = head;

                            BNode*prev=NULL;

                            bool added=false;

                            if(s<temp->value){

                                   prev = new BNode(s,temp);

                                   values = prev;

                                   count++;

                                   added=true;

                                   return;

                            }

                            else                            

                               while(temp->next != NULL){                              

                                    if(s < temp->value){

                                          prev->next=new BNode(s,temp);

                                          added=true;

                                          break;

                                    }

                                    prev = temp;

                                    temp=temp->next;

                               }

                            if(!added){

                                   temp->next=new BNode(s,NULL);

                            }

                      }

                      count++;

                      values=head;

               }

};

#endif

#ifndef BUCKETSORT_H

#define BUCKETSORT_H

class BucketSort

{

   private:

          int*sequence;

          int length;

   public:

          BucketSort(int * Sequence, int Length){

                   sequence=Sequence;

                      length=Length;

          }

          BucketSort(BucketSort &b){

                 sequence = b.sequence;

                 length = b.length;

          }

          ~BucketSort(){}

          int * getSequence(){  return sequence;  }

          int* Sort(){

                int min=sequence[0], max=sequence[0], count=length;

                //determine max and min to attempt a suitable guess of bucket size

                for(int i=1;i<length;i++){

                       if(min > sequence[i])

                             min = sequence[i];

                       if(max < sequence[i])

                             max = sequence[i];

                }

                int bucketSize = (max - min)*2/count; //guess bucket size

                int noOfBuckets = max/bucketSize; //determine amount of buckets

                BDomain *buckets = new BDomain[noOfBuckets+1]; //create buckets

                for(int i=0;i<noOfBuckets;i++) //initialize buckets and ranges

                       buckets[i].init(i*bucketSize,bucketSize);

                int bIndex=0;

                for(int i=0;i<length;i++){ //fill buckets with data set

                       bIndex = sequence[i]/bucketSize; //determine correct bucket for data

                       buckets[bIndex].addValue(sequence[i]);

                }

                int *tempNumbers;

                int tempCounter=0;

                for(int i=0;i<noOfBuckets+1;i++){  //for each bucket

                       tempNumbers =buckets[i].getValues(); //get all bucket values

                       for(int b=0;b<buckets[i].getCount();b++){ //add values to sequence in that orcer

                             sequence[tempCounter++]=tempNumbers[b];

                       }

                }

                return sequence;

          }  

};

#endif

1.9 Limitation of cocktail Sort

  • They are better sort algorithm such as merge sort and heap sort that run better
  • The average time increases almost exponentially as the number of table elements increase.
  • The simplicity of algorithm make’s poor choice to deal with large data set
  • Even due it’s an improvement over bubble sort, insertion sort still runs faster than it 

2.0 Bucket Sort Illustration

http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Merge_sort_algorithm_diagram.svg/300px-Merge_sort_algorithm_diagram.svg.png

2.1 Stable Sort Algorithm

  Stability means that equivalent elements retain their relative positions, after sorting. (If the elements have class/structure type and are ordered according to their value on one field the key field then equivalent elements are elements with equal keys). Unlike the cocktail sort, the bucket sort is not a stable algorithm.

2.2 Comparison

Algorithms Cocktail Sort Bucket Sort
Asymptotic comparison (Worst) O(n^2) O(n^2)
Implementation level Fairly Simple Complex
Real life applications    
Asymptotic comparison (Best) O(n) O(1)
Asymptotic comparison (Average) O(n^2) O(n+K)
Worst Case Space Complexity O(1) O(n.k)

References

Black, P. (2009). Bucket Sort. Obtained from the Dictionary of Algorithms and Data Structures on the

National Insititute of Standards and Technology website  on November 5, 2013 from

http://xlinux.nist.gov/dads//HTML/bucketsort.html

Paul E. Black and Bob Bockholt, "bidirectional bubble sort", in Dictionary of Algorithms and Data        

                Structures (online), Paul E. Black, ed., U.S. National Institute of Standards and Technology. 24

                August 2009. (accessed: 20 nov 2013)

Read More