Creating New Haarcascade Files

In my last blog I said I probably needed to add more smiles to the haarcascade_smile.xml file to get more smile hits. I thought about what was happening when I didn't restrict smile checking just to the eyes and mouth; hits occurred outside of the face, as well as the face. I think the person who created haarcascade_smile.xml didn't bother cleaning up the smile images he or she used. E.g., say one of the images used to create haarcascade_smile.xml was a smiling girl standing right next to a Chevrolet Silverado. Somebody might check on a picture of a frowning man standing right next to a Chevrolet Silverado; the user will see a positive hit for a smile because there is a Chevrolet Siverado in the picture. If the person who created the haarcascade_smile.xml file did so by searching for images that had "smile" assigned to the alt tag or title tag, and didn't visibly inspect the pictures, then there might not even be that many smiles in the file, even though it's advertised as containing the data from 7,000 images. haarcascade_smile.xml is catching some smiles, but not enough.

The following images are the first 100 positive images I've cleaned up. As you can see, they consist of just faces, in grayscale: (Note added 09/14/2016. Some of the pictures look blurry, or pixelated. I'm not keeping these pictures, because I don't want bad data. When I get 2,000 smiling images I'll take a look at all of the images I processed and replace the blurry, pixelated images. At the end of each 100 images I added a link for the next 100 images. As of this writing there are 400 images on this page. I already have 1000 negative images. After getting 2,000 smiles I'll create a haarcascade file out of them.)

Here are the links I promised yesterday (as of this writing) to the code used in getting the above images. Hover your mouse over a link to see a short description about the file.

SearchCrawlerIMG.javagetFacesFromImages.pyResizeGD.plgetFiles.pl

Return To My Blog Page       Return To My Programming Page