Description
Despite the recent advances of deep neural networks and achieving state-of-the-art perfor- mance in many fields, they are susceptible to bias and malfunction in unseen situations. The complex computation behind their reasoning is not sufficiently human-understandable to de- velop trust. Due to the entrance of these networks to critical applications, the urge for interpret- ing their decisions and understanding the way they work has increased. In recent years, many pieces of research have been done in the field of explaining neural networks. Using them a bet- ter understanding of the reasons behind the model’s decisions and the general decision-making process of the model can be developed. Unfortunately, the proposed methods have problems. External explainer methods are accused of fallacies due to assumptions and simplifications and may miss the real reasons. On the other side, the inherent self-interpretability of models, is planted in the model’s architecture and thus poses limitations over architectures and do not apply to the already trained models. The interpretation methods make it possible to diagnose the model and find its problems. This can be used in training and amending the trained models. Unfortunately, no research has been done on systematically analyzing the interpretations and finding causes of the problems. The goal of this project is to propose a method for improving the models using interpretations. Toward this goal, a module has been introduced that accom- modates convolutional neural networks to self-interpretability. The module is useable in all the architectures, even the already trained ones. Also, a new weakly supervised training method has been proposed that enables the designer to control the behavior of the network. Various experiments have been done on this module and the proposed training method and their effec- tiveness in the model’s performance in interpretability have been proved. The interpretations have also been compared to several well-known and commonly used explainer models and the superior performance of the proposed method has been shown. In the final part of this writing, several options have been proposed to improve the introduced module and also to complete the procedure for using interpretations in training the networks.