Note: this article is a translation of Japanese ver.
Hi, this is chang. Today I tried to consider how to construct deep learning using the small number of defected images.
Previously, I introduced the defect detection using AutoEncoder*2. *3 *4． In the article*5, I wrote that excluding non-defected images from training improved the inference accuracy． That means the defect detection using AutoEncoder requires the enormous number of defected images.
But in real, it is sometimes difficult to gather the large number of defected images, because a defect does not occur so frequently. I guess "only once in a year" is not rare in many manufacturing cites.
Improved algorithm*6 and data augmentation *7 have been suggested. Generation using GAN is a recent trend. Lack of the number of defected image is a fundamental problem of image processing. In this article, I try to construct deep learning from only one defected image.
The similar images are found if you search "flying saucer" or "ufo" on Google. Many images have the layout that a saucer-like-object are flying against the sky. I tries to duplicate the layout.
First, I cropped the saucer part. I painted the background in red to distinguish the saucer from background.
Next, I stack the saucer on the sky. I borrowed the sky images introduced on personal blogs etc. The point is choosing the images including not only the sky but buildings and mountains. It is effective to resemble the created images to existing ones. In addition, I picked the image with large size for cropping.
Below is the source code for cropping the background from sky images. The size of cropping was randomly determined between the range of 0.5 to 2.0 times of 256: the input image size to U-Net.
size = random.randint(int(c.IMG_SIZE/2.0), int(c.IMG_SIZE*2)) #image size if(max_x > size): org_x = random.randint(0, max_x - size) size_x = size else: org_x = 0 size_x = max_x if(max_y > size): org_y = random.randint(0, max_y - size) size_y = size else: org_y = 0 size_y = max_y cropped512 = sky.crop([org_x, org_y, org_x + size_x, org_y + size_y]) cropped512 = cropped512.resize([c.IMG_SIZE*2, c.IMG_SIZE*2]) cropped512 = np.array(cropped512)
Below is the source code for stacking a saucer on the sky. The size, location, and rotation angle of saucers were randomly given. The brightness of saucers were a little modified to smooth the boundary between the saucer and the background. Down-sizing of images is also the technique for smoothing.
# add ufo ufo = ufo_org.copy() # copy original image rate = random.uniform(0.2, 1.0) # size of 0.5 to 1.0 times ufo = ufo.resize([int(ufo.size*rate), int(ufo.size*rate)]) ufo_gray = np.array(ufo.convert("L"))# gray scale ufo = np.array(ufo) # tp array bright = np.mean(cropped512)/np.mean(ufo) ufo = ufo*bright # adjust brightness x = random.randint(0, c.IMG_SIZE*2) y = random.randint(0, c.IMG_SIZE*2) rot = deg2rad(random.uniform(-30, 30)) # rotation angle label512 = np.zeros([c.IMG_SIZE*2, c.IMG_SIZE*2]) for i in range(ufo.shape): for j in range(ufo.shape): if ufo[i, j, 0] != 255 and ufo[i, j, 1] != 0 and ufo[i, j, 2] != 2: rot_i = int(i*math.cos(rot) - j*math.sin(rot) + 0.5) rot_j = int(i*math.sin(rot) + j*math.cos(rot) + 0.5) if 0 <= x + rot_i and x + rot_i < c.IMG_SIZE*2 and 0 <= y + rot_j and y + rot_j < c.IMG_SIZE*2: cropped512[x + rot_i, y + rot_j] = ufo_gray[i, j] label512[x + rot_i, y + rot_j] = 255 image = np.zeros([c.IMG_SIZE, c.IMG_SIZE]) label = np.zeros([c.IMG_SIZE, c.IMG_SIZE]) for i in range(c.IMG_SIZE): for j in range(c.IMG_SIZE): image[i, j] = (float(cropped512[2*i, 2*j]) + float(cropped512[2*i + 1, 2*j]) + float(cropped512[2*i, 2*j + 1]) + float(cropped512[2*i + 1, 2*j + 1]))/4.0 label[i, j] = (label512[2*i, 2*j] + label512[2*i + 1, 2*j] + label512[2*i, 2*j + 1] + label512[2*i + 1, 2*j + 1])/4.0
Following images are the created images and labels. This time, I generate 1000 pairs for training. The infinite patterns can be generated if you wanna. I think the image with a large size saucer was quite artificial.
Left: training image, Right: label
I used U-Net. I showed the inference result from existing UFO images obtained with Google. The red rectangles show the area with high probabilities of defect(=flying saucer).
Left: input, Center: inference, Right: input with defection(red rectangle)
In many cases, the created network worked well. But it tended to be weak for (1) big saucers that lap the whole area of the image and (2) saucers with low brightness(=white)
I guess many people think "Is this the fake???" Actually, this try is a training of composite photographs. I myself have a little doubt "Is deep learning essential even if you create composite photographs?"
One of the purpose of deep learning is to learn skills of human beings. In the case of defect detection, annotation by human let networks to imitate human judgement. If you use composite photographs for training, the purpose is destroyed.
On the other hand, deep learning has a high impact as a image processing algorithm. I have a impression that deep learning is more flexible and robust compared to the traditional pattern matching. Making use of them(flexibility and robustness in image processing) is I think very effective.
In addition, there's a possibility to control the behavior of neural network by creating the images and labels by yourself. Generally, you cannot control the training process of neural network. That means you cannot predict or modify the behavior of inference. Today, we obtained the result that "the created network was weak for large or white saucers." The reason is obvious: in the training images, such the saucers are not included. It is highly possible that spreading the variation of the size and brightness of saucers improves the accuracy.
Engineers who want to detect defection know the size and brightness of defect. In some cases, I guess you can create training images to cover the whole variation of real defect. If so, training using composite photographs is more effective than real photographs. The real data always have bias and not be the statistic population.
Enabling deep learning from small data is a big impact. Please image a "WANTED" poster that shows a criminal's face. In many cases, surveillance cameras reflect the criminal only from the the limited angles. Currently, deep learning is impossible from the limited images. When you start to use iphone 10, it makes you move your face laterally and vertically in front of the camera. In real(=from surveillance cameras), such ideal shoots cannot be obtained.
Deep learning from small data improves this problem. It will make WANTED posters unnecessary. Leaning from the limited number of images, then the created network are implemented in cameras of all over the world,,, It is not distant future.
Privacy problem is a negative aspect of the convenience. Many shops will use cameras to recognize customers, make purchase records, and show recommendation. It is not comfortable for me.
I had a time to feel frustration for lacking training data. Today's try helped me regain the motivation. If I have a chance, I will try to improve the UFO defection using the approach I wrote in the consideration above.
By the way, many of existing UFO images have a very similar layout. I guess this helped the leaning, ha, ha, ha,,,
The source code is here*16.