Summary: I/O-algorithms Spring 2008Project 3 -- Theoretical Homework
The goal of this project is to use the theoretical techniques we have discussed in the first part
of the course to design new algorithms and data structures. The project should be done in the
same groups as project 1 and 2. A report with the solutions is due on Thursday May 29, 2008.
Remember to argue for correctness and complexity of each of your solutions. The evaluation of the
project will be part of the final grade.
1. Design an I/O-efficient algorithm for removing duplicate from a multiset of N elements (you
can not assume the range of the elements is known); The output from the algorithms should
be the K distinct elements among the N input elements in sorted order, and the algorithm
should run in O max N
B - K
B logM/B Ni, N
B I/Os, where Ni is the number
of copies of the i'th elements in the input set.
(Hint: Use merge-sort and remove duplicates as soon as they are found. Analyze the algorithm