Counting Colours in an Image
Counting the number of distinct colours in an image doesn't sound like a particularly hard thing to do until you try it on a large 24-bit image. This article demonstrates one technique for counting the colours quickly.
One basic technique for counting the colours in an image would be to create an array or collection to hold the colours and then start counting from the start of the image. However, consider what happens then if you load a 1280 x 1024 pixel image. Firstly, you have 1.3 million pixels to check, and secondly for every single one you need to check whether that colour has been counted before. With an array, that would potentially mean you could have to perform somewhere around 400 billion checks, and with a collection you'll probably find it dies if there are more than 10,000 unique colours.
Clearly you need a quicker method. One method is to use indexes to divide the problem down and some basic algorithmic techniques. For this sample I do the following:
Sorted Lists and Binary Search
If a list is sorted, you can find an item in it much more quickly that you can in an unsorted one. This technique is called "Binary Search" and works as follows:
Obviously to use this technique you need the list to be sorted in the first place. You can perform sorting as you add items to a list using the same Binary Search technique to work out where the insertion point is; the only thing that prevents this normally in VB is that it is slow to insert an item into an array. Luckily if you're creating an array of Longs, you can make inserting quicker by using CopyMemory. This technique is described in the article A Fast Index-Based Collection.
Enhancing the code in that article to provide a binary search method is straightforward. Here's the code you need:
Public Function BinarySearch( _ ByVal lFor As Long, _ ByRef lInsertIndex As Long _ ) As Long lInsertIndex = 0 BinarySearch = plBinSearch(lFor, lInsertIndex, 1, m_lCount) End Function Private Function plBinSearch( _ ByVal lFor As Long, _ ByRef lInsertIndex As Long, _ ByVal lStart As Long, _ ByVal lEnd As Long _ ) As Long If lEnd - lStart > 1 Then Dim iP As Long iP = lStart + (lEnd - lStart) \ 2 If m_lItem(iP) = lFor Then ' Success: plBinSearch = iP ElseIf m_lItem(iP) > lFor Then ' the centre element is greater than the ' item we're searching for. Set the end ' to the centre element & repeat: lEnd = iP - 1 plBinSearch = plBinSearch(lFor, lInsertIndex, lStart, lEnd) ElseIf m_lItem(iP) < lFor Then ' the centre element is less than the ' item we're searching for. Set the start ' to the centre element & repeat: lStart = iP + 1 plBinSearch = plBinSearch(lFor, lInsertIndex, lStart, lEnd) End If Else ' 1 or 2 items left. Check if either ' match the search item. If (lEnd < lStart) Then lInsertIndex = lStart Else If m_lItem(lEnd) = lFor Then plBinSearch = lEnd ElseIf m_lItem(lStart) = lFor Then plBinSearch = lStart ElseIf lFor > m_lItem(lEnd) Then lInsertIndex = lEnd + 1 ElseIf lFor > m_lItem(lStart) Then lInsertIndex = lEnd Else lInsertIndex = lStart End If End If End If End Function
The function returns the index of the item if found, otherwise it returns 0 (the collection is implemented with 1-based indexes). Adding the second lInsertIndex parameter lets you know where you should insert an item in the array if it isn't found.
So with this in place, counting colours can be implemented more quickly.
Counting by sub-division
Even using this technique, using a single array to store all possible colours results in too slow a count. To speed it up, you can subdivide the colour set, in effect adding an index to the colours. In the sample I've chosen the green component of the colour as the basis for the index, because the green component of any image is the brightest of the RGB components in an image. Creating 255 sorted sub arrays using a class may sound like madness but as you will see if you try modifying the code to use a single array for an image with a large number of colours its much, much quicker.
This technique is much better than any of the naive implementations. A 1280x1024 image with 264,000 distinct colours can be counted in just over a second on my current Athlon XP 2200 machine (which is at the time of writing having a number of performance difficulties all of its own). However, I note with envy that Paint Shop Pro can do the same task more quickly (less than 0.3s). How do they do it? If you know, or have an ideas then I'd love to know!