Concurrent file processing with java

Imagine you want to compare a big hand full of binary files (e.g audio, video or image files). Roughly speaking that are n2/2 comparisons. If the files or the amount is large this is quite costly. If you have multiple processors available is is prudent to split the workload into different threads.

You make three lists:

  1. List of all files that you want to compare
  2. List of files that are equal or similar
  3. List of files that were processed

The first thread takes the first entry of the list and compares with all others. The second takes the second and compares with all others. But there is a catch: The second one must not compute the comparison of the second and first file which is done by the second thread. Furthermore you will want to persist the data, in case the program crashes (typically OutOfMemory). There you must ensure that the persistence is Transactional.
To achieve this you want a slightly more sophisticated data structure for the first list:

  1. All files must be contained as long the file is not compared to all others
  2. If a file is taken out to compare it to all others it is marked, so no other thread can process this file

Schreibe einen Kommentar