Submitted by mb on
A .NET class library that implements two closely related data mining algorithms: Apriori and Apriori All. The library contains Apriori algorithm in single-thread version done exclusively in C#, and parallel version implemented by mixing C# with OpenCL. The project also includes performance analysis and comparison of both implementations of Apriori algorithm. On the other hand, Apriori All is implemented only in one-thread version at the moment. Implementation of parallel version of Apriori All algorithm is planned, but I honestly don't know when I'll get down to it.
The library is accompanied by two programs: a benchmark tool that can measure running time for various inputs and other constraints, and a simple console application that is linked to the library so that you can see how the algorithm library can be used in practice.
Apriori
Serialized version of the algorithm was implemented by my colleague. I've implemented the parallel version. The parallelization was done using divide and conquer method, and I've mainly used general-purpose kernels like logical operators, maximum, minimum or sum. In general, implementation depends on sequential launching of parallel scan in three phases with barriers between them, until the final results are obtained.
Apriori All
My implementation of this algorithm is based on prefix trees - a data structure ideal for this algorithm. In very early version, without the prefix trees the algorithm slowed down very quickly as the database size increased. With prefix trees, the slowdown is at much lower level.
The library is capable of reading XML files with input, as well as C# arrays and any objects that implement specially designed interfaces. Implementing those interfaces is the recommended way of using the library in your project. The library depends on AbstractOpenCL - one of my largest ongoing projects.
Apriori All Lib is open-source, and of course all required binaries are provided in the repository.