Researching the Research: Applying Machine Learning Techniques to Dissertation Classification

Suzanna Schmeelk (St. John's University, United States)
Tonya L. Fields (Pace University, United States)
Lisa R. Ellrodt (Pace University, United States)
Ion C. Freeman (Pace University, United States)
Ashley J. Haigler (Pace University, United States)


This research examines industry-based dissertation research in a doctoral computing program through the lens of machine learning algorithms to determine if natural language processing-based categorization on abstracts alone is adequate for classification. This research categorizes dissertation by both their abstracts and by their full-text using the GraphLab Create library from Apple’s Turi to identify if abstract analysis is an adequate measure of content categorization, which we found was not. We also compare the dissertation categorizations using IBM’s Watson Discovery deep machine learning tool. Our research provides perspectives on the practicality of the manual classification of technical documents; and, it provides insights into the: (1) categories of academic work created by experienced fulltime working professionals in a Computing doctoral program, (2) viability and performance of automated categorization of the abstract analysis against the fulltext dissertation analysis, and (3) natual language processing versus human manual text classification abstraction.


Machine learning;Natural language processing (NLP);Abstract vs fulltext dissertation analysis;Industry-based;Dissertation research classification;GraphLab Create library;IBM Watson Discovery

Full Text:



[1] Susan M. Merritt, Allen Stix, Judith E. Sullivan, Fred Grossman, Charles C. Tappert, and David A. Sachs. Developing a professional doctorate in computing: a fifth-year assessment. In Working group reports from ITiCSE on Innovation and technology in computer science education (ITiCSE-WGR '04). ACM, New York, NY, USA, 2004: 42-46. DOI:

[2] Fred Grossman, Charles Tappert, Joe Bergin, and Susan M. Merritt. A research doctorate for computing professionals. Commun. ACM 54, 2011, 133-141. DOI:

[3] L. R. Ellrodt, I. C. Freeman, A. J. Haigler, S. E. Schmeelk. Doctor of Professional Studies in Computing: A Categorization of Applied Industry Research. in 2018 IEEE Frontiers in Education Conference (FIE), 2018: 1-6.

[4] Lisa R. Ellrodt, Ion C. Freeman, Ashley J. Haigler, Lynne E. Larkin, Suzanna E. Schmeelk, Ronald G. Williams. Pace University DPS in Computing Studies: A Categorization of Applied Industry Research. The Michael L. Gargano 16th Annual Research Day. Pace University. May 2018 Retrieved from:

[5] I. Freeman, A. Haigler, S. Schmeelk, L. Ellrodt, T. Fields. What are they Researching? Examining Industry-Based Doctoral Dissertation Research through the Lens of Machine Learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018: 1338-1340.

[6] Ashley J. Haigler, Suzanna E. Schmeelk, Tonya L. Fields, Lisa R. Ellrodt, Ion C. Freeman. Employing Machine-Learning to Understand Research Trends of Full-Time Working Professionals. The Michael L. Gargano 17th Annual Research Day. Pace University, 2019. Retrieved from:

[7] Ashley J. Haigler, Suzanna E. Schmeelk, Tonya L. Fields, Lisa R. Ellrodt, Ion C. Freeman. Educational Needs in Computing of Experienced Full-Time Working Professionals. Future of Education. Florence, Italy, 2019.

[8] Dhillon, Paramvir, Amandeep Walia. A Study on Clustering Based Methods. International Journal of Advanced Research in Computer Science, vol. 8, no. 4, May 2017, 8(4): 1-5, 0967-5697.

[9] J. A. Hartigan, M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society, 1979, Series C. 28(1): 100-108.

[10] Claire Fautsch, Jacques Savoy. Adapting the tf idf vector-space model to domain specific information retrieval. In Proceedings of the 2010 ACM Symposium on Applied Computing (SAC’10). Association for Computing Machinery, New York, NY, USA, 2010, 1708-1712. DOI:

[11] Duan, Xiaolin Gui, Mingan Wei, You Wu. A Resume Recommendation Algorithm Based on K-means++ and Part-of-speech TF-IDF. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM 2019). Association for Computing Machinery, New York, NY, USA, 2019, Article 50: 1-5. DOI:

[12] Ted Tao Yuan, Zezhong Zhang. Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 2018, 1347-1348. DOI:

[13] Yao, L., Mao, C., Luo, Y. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 7370-7377.

[14] Kumar M., Alshehri M., AlGhamdi R., Sharma P., Deep V., 2020. A DE-ANN Inspired Skin Cancer Detection Approach using Fuzzy C-Means Clustering. Mobile Network and Applications, 2020. DOI:

[15] Sinoara, R.A., Camacho-Collados, J., Rossi, R.G., Navigli, R., Rezende, S.O. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Systems, 2019, 163: 955-971.

[16] Aggarwal A., Rani A., Kumar M. A Robust Method to Authenticate License Plates using Segmentation and ROI Based Approach. Smart and Sustainable Built Environment, 2019. DOI:

[17] Stephanie Yung. All the Queens Voices: An Oral History, Visualized. A data visualization of Queens Memory program`s 400+ oral history interviews collected in Queens, New York." Thesis, Parsons School of Design at The New School. New York, 2019. Retrieved from:

[18] Ellie Frymire. An Exploration of the Social Movement #metoo. Thesis, Parsons School of Design at The New School. New York, 2018. Retrieved from:

[19] Daniel Jurafsky and James H. Martin. Speech and Language Processing (2nd Edition). Prentice-Hall, Inc., USA, 2009.

[20] C. Forrest. IBM launches Watson Discovery Service for big data analytics at scale. TechRepublic. Retrieved from:

[21] IBM. Corporation. Watson Discovery. IBM. Retrieved from:

[22] Suzanna E. Schmeelk, Tonya L. Fields, Lisa R. Ellrodt, Ion C. Freeman, Ashley J. Haigler. GitHub: JSCR Machine Learning: Researching the Research, 2020. Retrieved from:



  • There are currently no refbacks.
Copyright © 2020 Author(s)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.