skip page navigationOregon State University

The most popular insect group is perhaps the butterflies and moths (Lepidoptera) because of their beautiful and stunning wing colors and patterns. Lepidopteran wing patterns are not only elaborate but also very different from other animals that have color patterns on their bodies, such as birds and mammals. The color patterns of those animals are formed either randomly or evenly and the positions often differ from individual to individual. In Lepidoptera, the same spot or stripe appears in the same relative location for most individuals of that species. Therefore the size, shape, color and position of Lepidopteran wing patterns are very useful attributes for identifying the correct species. In reality, many amateur butterfly or moth specialists can identify species simply by browsing through the images. Although we all know the phrase “A picture is worth a thousand words”, for computational sciences, it is still a challenge to search and to identify an object based on its image.

In this project, we propose to develop an online image processing and a pattern recognition tool for helping biology students identify Lepidoptera species. Our system will provide a solution that can automatically detect a unique wing pattern in a databased image file. This will be a web-based application and the web browser will be the user interface. A user will upload digital images and use a stretchable rubber band crop tool over the image in order to define the signature pattern. The cropped image will be treated as a query and sent to a Lepidoptera image database for a match to any species with a similar pattern. The output will consist of an array of images which correspond to the location and scale of the pattern detected. To find a specific signature in an image, two steps are required: First, we will develop a hidden semantic concept discovery method to address effective semantics-intensive image classification. Each image in the database will be segmented to regions associated with homogenous color, texture, and shape features. A probabilistic model based on statistical-hidden-class assumptions of the image database will be developed to analyze semantic concepts hidden in the database. Second, we will use statistical correlation to compare a reference image that defines the signature pattern to test an image that may contain that particular signature. The resultant image is filled with the correlation coefficient data. The matrix in the resultant image with the highest value is the most probable location for its mark.

The images for this project will be collected from two sources. First, the Primary PI, Jeff Miller, has taken more than 1,000 images of butterfly and moth species mainly from Oregon and Costa Rica. Second, we will ask permission to access several Lepidoptera image databases such as the collections of Daniel H Janzen, Professor of Conservation Biology at the University of Pennsylvania, (estimated 15,000 images of adult Lepidoptera) and the collections from the Insect Museum at Taiwan Forest Research Institute (estimated 5000 images). These images will be stored in the image database server hosted by College of Forestry at OSU.