M. Tschannen, O. Bachem, and M. Lucic, “Recent Advances in Autoencoder-Based Representation Learning,” arXiv:1812.05069 [cs, stat], Dec. 2018, arXiv: 1812.05069. [Online]. Available: http://arxiv.org/abs/1812.05069
 Q. Liu, M. J. Kusner, and P. Blunsom, “A Survey on Contextual Embeddings,” arXiv:2003.07278 [cs], Apr. 2020, arXiv: 2003.07278. [Online]. Available: http://arxiv.org/abs/2003.07278
 R. Wang, S. Si, G. Wang, L. Zhang, L. Carin, and R. Henao, “Integrating Task Specific Information into Pretrained Language Models for Low Resource Fine Tuning,” in Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, Nov. 2020, pp. 3181-3186. [Online]. Available: https://www.aclweb.org/anthology/2020.findings-emnlp.285
 E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture,” IEEE Access, vol. 6, pp. 39 501-39 514, 2018, conference Name: IEEE Access.
 D. Erhan, A. Courville, Y. Bengio, and P. Vincent, “Why does unsupervised pre-training help deep learning?” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 201-208.
 K. Balasubramanian, P. Donmez, and G. Lebanon, “Unsupervised supervised learning II: Margin-based classification without labels,” Journal of Machine Learning Research, vol. 12, pp. 3119-3145, 2011.
 Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang, “The expressive power of neural networks: A view from the width,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/ file/32cbf687880eb1674a07bf717761dd3a-Paper.pdf
 A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in Proc. of the NIPS Workshop Autodiff, 2017.
 W. H. Wolberg, W. N. Street, and O. L. Mangasarian, “Breast cancer wisconsin (diagnostic) data set,” UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/], 1992.
 A. Conneau and D. Kiela, “SentEval: An evaluation toolkit for universal sentence representations,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), May 2018. [Online]. Available: https://www.aclweb.org/anthology/L18-1269 [OpenAIRE]
 A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learning of universal sentence representations from natural language inference data,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 670-680. [Online]. Available: https://www.aclweb.org/anthology/ D17-1070
 S. Ruder, “Nlp-progress,” 2020. [Online]. Available: fhttp://nlpprogress. com/english/semanticn textualn similarity.htmlg
 A. Warstadt, A. Singh, and S. R. Bowman, “Neural network acceptability judgments,” CoRR, vol. abs/1805.12471, 2018. [Online]. Available: http://arxiv.org/abs/1805.12471
 Huggingface, “Transformers results,” 2020. [Online]. Available: https: //huggingface.co/transformers/v2.3.0/examples.html
 C. Apte´, F. Damerau, and S. M. Weiss, “Automated learning of decision rules for text categorization,” ACM Transactions on Information Systems (TOIS), vol. 12, no. 3, pp. 233-251, 1994.