publication . Conference object . 2021

Unsupervised post-tuning of deep neural networks

Cerisara, Christophe; Caillon, Paul; Le Berre, Guillaume;
English
  • Published: 18 Jul 2021
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; We propose in this work a new unsupervised training procedure that is most effective when it is applied after supervised training and fine-tuning of deep neural network classifiers. While standard regularization techniques combat overfitting by means that are unrelated to the target classification loss, such as by minimizing the L2 norm or by adding noise either in the data, model or process, the proposed unsupervised training loss reduces overfitting by optimizing the true classifier risk. The proposed approach is evaluated on several tasks of increasing difficulty and varying conditions: unsupervised training, posttuning and anomaly det...
Subjects
ACM Computing Classification System: ComputingMethodologies_PATTERNRECOGNITION
free text keywords: deep learning, unsupervised training, regularization, natural language processing, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
32 references, page 1 of 3

[1] M. Tschannen, O. Bachem, and M. Lucic, “Recent Advances in Autoencoder-Based Representation Learning,” arXiv:1812.05069 [cs, stat], Dec. 2018, arXiv: 1812.05069. [Online]. Available: http://arxiv.org/abs/1812.05069

[2] Q. Liu, M. J. Kusner, and P. Blunsom, “A Survey on Contextual Embeddings,” arXiv:2003.07278 [cs], Apr. 2020, arXiv: 2003.07278. [Online]. Available: http://arxiv.org/abs/2003.07278

[3] R. Wang, S. Si, G. Wang, L. Zhang, L. Carin, and R. Henao, “Integrating Task Specific Information into Pretrained Language Models for Low Resource Fine Tuning,” in Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, Nov. 2020, pp. 3181-3186. [Online]. Available: https://www.aclweb.org/anthology/2020.findings-emnlp.285

[4] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture,” IEEE Access, vol. 6, pp. 39 501-39 514, 2018, conference Name: IEEE Access.

[5] D. Erhan, A. Courville, Y. Bengio, and P. Vincent, “Why does unsupervised pre-training help deep learning?” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 201-208.

[6] K. Balasubramanian, P. Donmez, and G. Lebanon, “Unsupervised supervised learning II: Margin-based classification without labels,” Journal of Machine Learning Research, vol. 12, pp. 3119-3145, 2011.

[7] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang, “The expressive power of neural networks: A view from the width,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/ file/32cbf687880eb1674a07bf717761dd3a-Paper.pdf

[8] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in Proc. of the NIPS Workshop Autodiff, 2017.

[9] W. H. Wolberg, W. N. Street, and O. L. Mangasarian, “Breast cancer wisconsin (diagnostic) data set,” UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/], 1992.

[10] A. Conneau and D. Kiela, “SentEval: An evaluation toolkit for universal sentence representations,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), May 2018. [Online]. Available: https://www.aclweb.org/anthology/L18-1269 [OpenAIRE]

[11] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learning of universal sentence representations from natural language inference data,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 670-680. [Online]. Available: https://www.aclweb.org/anthology/ D17-1070

[12] S. Ruder, “Nlp-progress,” 2020. [Online]. Available: fhttp://nlpprogress. com/english/semanticn textualn similarity.htmlg

[13] A. Warstadt, A. Singh, and S. R. Bowman, “Neural network acceptability judgments,” CoRR, vol. abs/1805.12471, 2018. [Online]. Available: http://arxiv.org/abs/1805.12471

[14] Huggingface, “Transformers results,” 2020. [Online]. Available: https: //huggingface.co/transformers/v2.3.0/examples.html

[15] C. Apte´, F. Damerau, and S. M. Weiss, “Automated learning of decision rules for text categorization,” ACM Transactions on Information Systems (TOIS), vol. 12, no. 3, pp. 233-251, 1994.

32 references, page 1 of 3
Any information missing or wrong?Report an Issue