LI Peng-fei1,2, HE Ming-xia1,2, XU Zhe1,2, LAI Hui-bin1,2, LIU Yue1,2
(1. School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin 300072, China; 2. National Key Laboratory of Precision Testing Techniques and Instrument, Tianjin University, Tianjin 300072, China)
Abstract: In order to deal with the unclear absorption peak caused by the absorption peak overlap of traditional Chinese medicine (TCM) and other mixtures, a method of three unsupervised clustering algorithms as K-means, K-medoids and Fuzzy C-means (FCM) combined with the first derivative characteristics of terahertz absorption spectrum, is proposed to perform the terahertz spectra clustering of Sanchi and other three kinds of TCM compared with their easily-confused products (ECPs). These three unsupervised clustering methods complement the scope of the supervised learning classification method. The first derivative of the spectrum could amplify the difference in the absorption coefficient with different substances, so that the obvious absorption peak can be revealed. Experiments shows that these three clustering algorithms can achieve good results by combining the origin absorption coefficient with its first-order derivative as the characteristic data, and among which K-means does the best with the accuracy of 95.32%. Compared with pure absorption coefficient data clustering, the accuracy in this study has been significantly improved, especially for the non-absorption-peak TCM classification. And the accuracy of K-means algorithm is improved by 5.38%. Besides, clustering algorithms in this study have strong anti-interference ability to the error data.
Key words: terahertz time-domain spectroscopy (THz-TDS); traditional Chinese medicine(TCM); clustering; first derivative spectrophotometry
CLD number: O433.4Document code: A
Article ID: 1674-8042(2017)04-0371-07doi: 10.3969/j.issn.1674-8042-2017-04-010
References
[1]WANG Fang, ZHAO Dong-bo, JIANG Ling, et al. Research on THz and Raman spectra of RNA nucleobases. Spectroscopy & Spectral Analysis, 2016, 36(12): 3863-3869.
[2]Kabeya M, Mori T, Fujii Y, et al. Boson peak dynamics of glassy glucose studied by integrated terahertz-band spectroscopy. Physical Review B, 2016, 94(22): 224204.
[3]JIANG Ling, YU Jiang-ping, XU Yu-tian, et al. Investigation of terahertz spectra of chlorophyll. Journal of Nanjing Forestry University (Natural Science Edition), 2015, 39(6): 181-184.
[4]ZHANG Tong-jun. Research on measurement technology of bio-molecules based on terahertz time-domain spectroscopy. Hangzhou: Zhejiang University, 2007.
[5]LIU Wei, LIU Chang-hong, HU Xiao-hua, et al. Application of terahertz spectroscopy imaging for discrimination of transgenic rice seeds with chemometrics. Food Chemistry, 2016, 210: 415-421.
[6]WANG Xin, HU Ke-xiang, ZHANG Lei, et al. Characterization and classification of coals and rocks using terahertz time-domain spectroscopy. Journal of Infrared Millimeter & Terahertz Waves, 2017, 38(2): 248-260.
[7]WANG Yan-hua, YANG Zhi-hao, LI Yan-peng, et al. Protein-protein interaction extraction based on the combination of supervised and semi-supervised learning method. Journal of Jiangxi Normal University, 2013, 52(10): 2757-2765.
[8]Dorney T D, Baraniuk R G, Mittleman D M. Material parameter estimation with terahertz time-domain spectroscopy. Journal of the Optical Society of America A Optics Image Science & Vision, 2001, 18(7): 1562-1571.
[9]Duvillaret L, Garet F, Coutaz J L. A reliable method for extraction of material parameters in terahertz time-domain spectroscopy. IEEE Journal of Selected Topics in Quantum Electronics, 1996, 2(3): 739-746.
[10]WANG Qian, WANG Cheng, FENG Zhen-yuan, et al. Review of K-means clustering algorithm. Electronic Design Engineering, 2012, 20(7): 21-24.
[11]MA Qing, XIE Juan-ying. New K-medoids clustering al-gorithm based on granular computing. Journal of Computer Applications, 2012, 32(7): 1973-1977.
[12]ZHANG Xiao-feng,ZHANG Cai-ming,TANG Wen-jing, et al. Medical image segmentation using improved FCM. Science China Information Sciences, 2012, 55(5): 1052-1061.
基于一阶导数的中药太赫兹光谱聚类
李鹏飞1,2, 何明霞1,2, 徐哲1,2, 赖慧彬1,2, 刘悦1,2
(1. 天津大学 精密仪器与光电子工程学院, 天津 300072; 2. 天津大学 精密测试技术及仪器国家重点实验室, 天津 300072)
摘要:针对中药等混合物吸收峰重叠导致无明显吸收峰的情况, 提出使用K-means、K-medoids和FCM三种无监督聚类算法结合太赫兹吸收谱一阶导数特征, 将三七、当归等四种中药品的太赫兹光谱分别与其易混品的太赫兹光谱进行聚类。 三种无监督聚类方法补充了监督学习分类方法的适用范围。 光谱一阶导数特征可以放大不同物质吸收系数整体或者是局部的微小差异。 实验证明, 使用原始吸收系数结合其一阶导数作为分类数据, 三种聚类算法都取得很好的效果, K-means算法准确率最高, 为95.32%。 相较于原始吸收系数作为分类数据, 聚类准确率提升明显, 尤其是对无吸收峰中药易混品的聚类, K-means算法准确率提升了 5.38%。 三种聚类算法对误差数据都具有很强的抗干扰能力。
关键词:太赫兹时域光谱; 中药; 聚类; 一阶导数法
引用格式:LI Peng-fei, HE Ming-xia, XU Zhe, et al. Terahertz spectrum clustering of traditional Chinese medicine based on first derivative characteristics. Journal of Measurement Science and Instrumentation, 2017, 8(4): 371-377. [doi: 10.3969/j.issn.1674-8042.2017-04-010]
[full text view]