Welcome to my portfolio site. Feel free the take a look around.

Potato Segmentation using Background Subtraction

Desktop Applications December 2014

The task presented with this assignment was to segment the potato’s from the background. As the background is a roller table, the colours match closely with those of the potato, making segmentation more difficult. This task provided a data set consisting of a video file and a series of background images that could be used to produce a model. Segmentation simply involves making every pixel defined as “not potato” black and every pixel defined as “potato” the untouched. The video above shows an example of the vision system in progress, including the original video file, some stages that the system goes through and then the final segmented stage.

The dataset for this task provides images of an empty roller table, from which a background model was built. Background subtraction was then used to detect the changes in the images, therefore segmenting the potatoes. Although in theory this method should work fine, when put into practice it did not do the job as expected. As the roller table moves slightly between images, the segmentation is greatly affected, leaving lines and lots of noise within the image. In addition, the colour difference between parts of the potato and the roller table where so similar, that often parts of the potato would be mistaken for the table and removed. As a result, it was concluded that the HSV colour space would give more accurate colour representation, with the saturation channel providing the clearest difference between potato and table, as shown in figure 3.

To begin, the background model was created by calculating the mean for each pixel within the 6 empty roller table images (the mean of pixel 1, 1 for all 6 images, then for 1, 2 etc). Once created, the model was converted into HSV colour space along with the input image containing the potatoes. Next the contrast was stretched to amplify the different between the foreground (potatoes) and background (roller table), providing cleaner segmentation between the two.

Now the preparation for the images is complete, the background subtraction can take place. The system uses nested for loops to iterate through the image subtracting the input image pixels from the background model pixels in the corresponding location (1, 1 from the input image is subtracted from 1, 1 in the background model). This process removes large amounts of the background and produces a new image.

Next, the saturation channel is extracted from the background subtraction result. This channel was chosen as it provides the clearest difference between the foreground and background elements. Once extracted, it’s converted into greyscale format. This method also normalises the matrix, which makes later computation simpler. This matrix is then interrogated using a fixed multi threshold value of > 0.1 and < 0.62. This removes much of the noise and left over background which the background subtraction was unable to remove. Unfortunately this process is not perfect and leaves some elements such as the lines between the rollers still within the image. This is shown in figure 4. Next a median filter is applied to smooth the image and remove small amounts of noise.
Unfortunately the median filter is not enough to fully remove all of the noise and unwanted elements (such as roller lines), so a morphological opening process is used to remove the remaining elements. An opening process consists of performing morphological erosion followed by dilation. This process aggressively removes elements from the image, then restores anything which is left using dilation. To achieve this, a disk structuring element of size 8 is used.

Although dilation does successfully restore most of the potato information, some sections towards the centre of the objects are not restored, leaving holes. To resolve this problem, the image is filled using the Matlab function imfill. This process fills any object which has complete edges. However, this is not the complete answer, as this function also fills gaps between the potatoes too, which is not desired. As a result, the newly filled image is compared with the original and any gaps that where filled and are larger than 300 are ignored. This means, large gaps such as those between potatoes are not restored.

Once complete, the system has successfully created a clean binary mask which can then be applied to the image to remove the background. This is done using multiplication. The mask is multiplied with each of the three RGB channels from the original image. This essentially means anything in the mask with value 0 (removed items) will correspond as 0 in the final image (because 0 multiplied by any number is 0). However, if the value is 1 in the mask, then the final image will contain the figure it originally had. For example, if the original pixel value was 234, then 1×234 would be 234 leaving the value the same.

This same method of segmentation is also used within the video data too, with one small difference. The multi threshold values are changed from > 0.1 and < 0.62 to > 0.1 and < 0.7. This change gives a much cleaner segmentation within the video. It’s thought the reason for this is because there could be a light above the potatoes within the image causing subtle reflections on the potatoes, therefore changing their colour values.

To take this work one step further, separation of the potatoes after the segmentation process was attempted. This challange proves quite difficult as some of the potatoes are very close together to form larges blobs within the binary mask. One method which has been partially implemented to solve this problem is watershed segmentation. This segmentation method works by analysing the topographic surface of the greyscale image, then using a flooding method attempts to discover the contours for the objects, thus allowing separation. Unfortunately this method could not be fully implemented due to time constraints.

This content has been taken from the assessment documentation created as part of this module.

This was part of my Computer Vision and Robotics module for year 4 for my Masters of Computing (MCOMP) degree with The University of Lincoln. If you would like a copy any documentation (including the corresponding report), then please contact me using the facilities on this website.

Browse portfolio


  • The University of Lincoln