オッサンはDesktopが好き

自作PCや機械学習、自転車のことを脈絡無く書きます

Deep learning: multi-channel is very useful for multi-kind defection

This article is a translation of Japanese ver.
日本語版はこちら*1

 Hi, this is chang.

 Previously, I tried to make 6 Channel U-Net learn defects with multiple kinds. Then I wrote "1 channel network for each kind seemed to be better"*2. Today, I tested this in detail.

 Conclusion is 6 channel is better contrary to my expectation.

f:id:changlikesdesktop:20200719110403p:plain:w600

0. What's channel?

 The channel of neural network is the layer that is vertical to the image. For simplicity, I am explaining in way of image processing. The channel is often used for representing RGB of color images. I used in this way for GAN previously*3*4.

f:id:changlikesdesktop:20200730050803p:plain:w150

 For making U-Net learn multiple kinds of defect, the channel is used for outputting the inference of each kind. If there are 6 kinds of defect, the network is required to have 6 channels and output 6 kinds of inference.

f:id:changlikesdesktop:20200730051444p:plain:w600

 Strong processes like CNN or max pooling do not work in the channel direction. But the network can connect in the direction. Despite the network has to detect the different kinds of defect, they depends on each other. That means the connection among the channels are weakened during learning. In the result, the network works independently for each kind of defect. So I predicted that multi-kind defection using channels requires long-long time for training.

1. Developing apps.

 Imagine that you construct deep learning and implement the network into apps. In many cases, training of deep learning are executed on a linux computer with large GPU. On the other hand, the apps for users run on iOS, Android or Windows. So you have to transfer the trained network to the environment for use and write code for inference in the environment. Today, we assume the Window application with C#.

 For convenience of application, we think the following 4 points.

(1) Training time

 Training time has nothing to the convenience of application. But if the time is too long, you cannot release applications. So you have to design the network architecture with the realistic training time.

(2) Size of trained network

 Trained network are binary files outputted from the libraries (tensorflow and keras in may case). The application loads the network. If the size of network is too large, too much memories or GPU are consumed. In my view, using GPU in application is not realistic, so today we use CPU for inference and argue for the memory consumption.

(3) Inference time

 You will feel frustration if the application takes long time to show the result. So we consider to decrease inference time as less as possible.

(4) Accuracy of defection

 The most important point: we consider the accuracy of defection.

2. Dataset

 We use DAGM with defect*5.

f:id:changlikesdesktop:20200719095620p:plain:w400
DAGM
From left-up, the defected image of Class_1, Class_2, Class_3, Class_4, Class_5, and Class_6.
There are 150 defected images for each class.

3. Program

 About deep learning, there were no big changes from the previous development. New development was the viewer using C#.

f:id:changlikesdesktop:20200719085500p:plain:w400
Viewer for inference result
Drag & drop image to PictureBox, push button, then inference result is shown.
In the ListBox, processing time is shown.

4. Result

(1) Training time

 I showed the comparison of training time with the same condition: 8 batch-size and 200 epochs.

Class Training time[s]
1 794
2 798
3 800
4 801
5 789
6 797
all 5488

 Training of 1 channel took about 800 sec. per a class. Training of 6 channel took 5500 sec(91.6 min.). The time of 6 channel is greater than the six times of 1 channel, but the difference was small(12 min.).

 The progress of training also did not have a big difference although I predicted the accuracy of 1 channel increased in a shorter time. As is shown bellow, the accuracy for the test data reached stable at about 50 epochs both in 1 and 6 channel.

f:id:changlikesdesktop:20200719115348p:plain:w400
Transition of accuracy for test data

(2) Size of trained network

 I showed the size of the trained networks. There was no big difference between 1 channel and 6 channel. The size of 6 channel is a little greater(7KB),,, but I think you can ignore it.

Class Size of trained network[KB]
1 404.469
2 404.469
3 404.469
4 404.469
5 404.469
6 404.469
all 404.506

 Although I could not measure the precise memory consumption, windows task manager showed that both the network of 1 and 6 channel consumed about 600 MByte. It is very interesting that increasing channel does not increase memory consumption. 6 channel has much advantage because loading 6 networks of 1 channel consumes 3.6 GByte.

(3) Inference time

 Although there was no statistic comparison, the inference took about 6 sec. in every network. 6 channel took longer a little(10 mses.), but the difference was very very small.

(4) The accuracy of defection

 I showed the inference result for a defect image of class_1, inferred using networks trained with 1 channel. As you see, the network of Class_5 and Class_6 misjudged.

f:id:changlikesdesktop:20200719093912p:plain:w600
Inference result of 1 channel U-Net
From left-up, the result of the trained network of Class_1, Class_2, Class_3, Class_4, Class_5, and Class_6

 On the other hand, the network of 6 channel judged correctly that only class_1 had defect.

f:id:changlikesdesktop:20200719093932p:plain:w400
Inference result of 6 channel U-Net

 Considering the accuracy for all the classes, 6 channels was superior.

5. Consideration

Channel Training time Size Inference time Accuracy
1
6

 The results were far from my expectations. I predicted that 1 channel is superior for the accuracy. I had overlooked that 1 channel were easy to misjudge in other classes. There is a option for 1 channel that training the images of other classes as non-defect... I think it does not work well because it requires the complex processing like weighting*6.

 Only one superior point of 1 channel is I think flexibility. If you want to add other kind of defect after training and the number of channel was the same to current variation, you have to construct network from the beginning. If you have margin of channel, I think it is possible to perform additional training for additional defect data. It requires investigation. In my view with the current experiment, multi-channel is better if the system spec. and the number of defect are fixed. Single channel is good for base investigation before the actual development.

6. Afterword

 It was fun with new discovery. I wonder why the inference time has no difference between single and multiple channel. Sorry for my poor knowledge. I will investigate in the future.

 Today's source code is here*7.