Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis

Qing Hou (Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China)
Guangjian Wang (Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China)
Xiaozheng Wang (Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China)
Jiaxi Xu (Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China)
Yang Xin (Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China)

Article ID: 1808


Big data analysis has penetrated into all fields of society and has brought about profound changes. However, there is relatively little research on big data supporting student management regarding college and university’s big data. Taking the student card information as the research sample, using spark big data mining technology and K-Means clustering algorithm, taking scholarship evaluation as an example, the big data is analyzed. Data includes analysis of students’ daily behavior from multiple dimensions, and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism, votes of teachers and students, etc. At the same time, students’ absenteeism, physical health and psychological status in advance can be predicted, which makes student management work more active, accurate and effective.


Spark; Clustering algorithm; Big data; Data analysis; Mllib

Full Text:



[1] Yihua Huang. Understanding Big Data[M]. China Machine Press, 2014.

[2] Meiling Huang. Spark MLlib Machine Learning: Algorithm, Source Code and Actual Combat Details[M]. Publishing House of Electronics Industry, 2016. (in Chines)

[3] Aiwu Zhou, Dandan Cui, Yong Pan. An Optimization Initial Clustering Center of K-means Clustering Algorithm[J]. Microcomputer and Its Applications, 2011, 30(13): 1-3.

[4] Weizhong Zhao, Huifang Ma, Yanxiang Fu, et al. Research on Parallel K-means Algorithm Design Based on Hadoop Platform[J]. Computer Science, 2011(10): 166-168.

[5] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.

[6] Jianpei Zhang, Yue Yang, Jing Yang, et al. Algorithm for Initialization of K-Means Clustering Center Based on Optimized-Division[J]. Journal of System Simulation, 2009, 21(9): 2586-2589.

[7] The Apache Software Foundation. Apache Mahout: Scalable Machine Learning and Data Mining [EB/ OL], 2014.

[8] F Wang, Z Liu. Optimization method of distributed K-means algorithm based on Spark. Computer Engineering and Design, 2019; 40(6): 1595-1600. DOI: 10.16208/j.issn1000-7024.2019.06.017

[9] Y Qu, W Deng, F Hu, et al. Algorithm for ordering points to identify clustering structure based on spark. Computer Science, 2018; 45(1): 97-102+107. DOI: 10.11896/j.issn.1002-137X.2018.01.015

[10] M Xu, C Yu, H Shen. Research on K-means algorithm of spark parallelization. Microelectronics & Computer, 2018, 35(5): 95-99.

[11] Liu P, Teng J, Zhang G, et al. Parallel K-means algorithm for massive texts on spark. The 2nd CCF Big Data Conference, 2014. (in Chinese). Available from:



  • There are currently no refbacks.
Copyright © 2020 Author(s)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.