Evaluating Word Similarity Measure of Embeddings Through Binary Classification

A. Aziz Altowayan (Computer Science Department, Pace University, New York, United States)
Lixin Tao (Computer Science Department, Pace University, New York, United States)


We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.


Word embeddings;Embeddings evaluation;Binary classification;Word2vec

Full Text:



[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv.org, 2013.

[2] Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org

[3] Marco Baroni, Georgiana Dinu, and Germ’an Kruszewski. Don’t count, predict! a systematic comparison of context-counting vs. contextpredicting semantic vectors. In ACL, 2014, (1): 238–247.

[4] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. EMNLP, 2014: 1532–1543.

[5] Christopher D Manning. Computational linguistics and deep learning. COLING, 2015, 41(4): 701–707.

[6] Wang Ling, Chris Dyer, Alan W Black, and Isabel Trancoso. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015: 1299–1304.

[7] Amir Bakarov. A survey of word embeddings evaluation methods. arXiv preprint arXiv:1801.09536, 2018.

[8] Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer. Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276, 2016.

[9] Siwei Lai, Kang Liu, Shizhu He, and Jun Zhao. How to generate a good word embedding. IEEE Intelligent Systems, 2016, 31(6): 5–14.

[10] Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 298–307.

[11] Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. Learning sentimentspecific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, 1: Long Papers: 1555–1565.

[12] Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. Association for Computational Linguistics, 2011, 1: 142-150.

[13] Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.

[14] Gr’egoire Mesnil, Tomas Mikolov, Marc’Aurelio Ranzato, and Yoshua Bengio. Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. AAAI Spring Symposium AI Technologies for Homeland Security 200591-98, cs.CL, 2014.

[15] R’emi Lebret and Ronan Collobert. The sum of its parts: Joint learning of word and phrase representations with autoencoders. arXiv preprint arXiv:1506.05703, 2015.

[16] Omer Levy, Yoav Goldberg, and Ido Dagan. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 2015, 3(0): 211–225.

[17] Edward Grefenstette, Phil Blunsom, Nando de Freitas, and Karl Moritz Hermann. A deep architecture for semantic parsing. arXiv preprint arXiv:1404.7296, 2014.

DOI: https://doi.org/10.30564/jcsr.v1i3.1268


  • There are currently no refbacks.
Copyright © 2019 Author(s)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.