SONG Xiaoyu1,2, FENG Xiaobei1, ZHU Lin1, LIU Tong1, WU Hongyang1, LI Yifan1
(1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2. Lanzhou Blue Whale Information Technology Co., Ltd., Lanzhou 730070, China)
Abstract: Single nucletide polymorphism (SNP) is an important factor for the study of genetic variation in human families and animal and plant strains. Therefore, it is widely used in the study of population genetics and disease related gene. In pharmacogenomics research, identifying the association between SNP site and drug is the key to clinical precision medication, therefore, a predictive model of SNP site and drug association based on denoising variational auto-encoder (DVAE-SVM) is proposed. Firstly, k-mer algorithm is used to construct the initial SNP site feature vector, meanwhile, MACCS molecular fingerprint is introduced to generate the feature vector of the drug module. Then, we use the DVAE to extract the effective features of the initial feature vector of the SNP site. Finally, the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines (SVM) to predict the relationship of SNP site and drug module. The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest (RF) and logistic regression (LR) classification. Further experiments show that compared with the feature extraction algorithms of principal component analysis (PCA), denoising auto-encoder (DAE) and variational auto-encode (VAE), the proposed algorithm has better prediction results.
Key words: association prediction; k-mer; molecular fingerprinting; support vector machine (SVM); denoising variational auto-encoder (DVAE)
References
[1]SUN H Y. The analysis of SNPs in successful drug targets. Zibo: Shandong University of Technology, 2012.
[2]ZHU Y Q. Development of human CYP3A4 two non-synonymous mutationsyeast expression system and characterization of the polymorphism effecting on metabolic analysis and drug screen in vitro. Xi’an: Northwest University, 2010.
[3]ZHANG X Z, TAI Y, ZHANG L L, et al. Advances of gene polymorphism affecting rheumatoid arthritis drug efficacy and adverse drug reactions. Chinese Pharmacological Bulletin, 2018, 34(9): 1193-1198 .
[4]ZHANG Z H, TAN Y, HUA B, et al. Effect of clopidogrel on CYP2C19 gene polymorphism in patients with ischemic cerebral infarction. Ningxia Medical Journal, 2021, 43(1): 11-13.
[5]YANG J, WANG S, DONG T W, et al. Relationship between CYP2D6*10 gene polymorphism and metoprolol therapeutic effect for hypertension. Journal of Cardiovascular Rehabilitation Medicine, 2014, 23(2): 155-159.
[6]HUANG J, YU H C, SUN G Z, et al. Selection of anti-platelet drug in patients with acute coronary syndrome after percutaneous coronary intervention guided by CYP2C19 examination. Journal of Clinical Cardiovascular Disease, 2016, 32(4): 342-346.
[7]FENG S S, WEI S Y, HE S, et al. Meta analysis of the association between MTNR1B rs10830963 gene polymorphism and type 2 diabetes in Chinese population. China Medical Biotechnology, 2021, 16(1): 78-81.
[8]YU J, YANG L, ZHANG S Q, et al. Distribution of seven hypertension drug-related gene polymorphisms in 582 Han patients with hypertension. International Journal of Laboratory Medicine, 2021, 42(3): 325-328.
[9]IVANCIUE O. Machine learning quantitative structure-activity relationships (QSAR) for peptides binding to the human amphiphysin-1 SH3 domain. Current Proteomics, 2009, 6(3): 181-190.
[10]CHENG F, WANG Q, CHEN M, et al. Molecular docking study of the interactions between the thioesterase domain of human fatty acid synthase and its ligands. Proteins-structure Function & Bioinformatics, 2008, 70(4): 1228-1234.
[11]EZZAT A, WU M, LI X L, et al. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinformatics, 2016, 17(19): 81-88.
[12]WANG L, YOU Z, CHEN X, et al. RFDT: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Current Protein and Peptide Science, 2016, 19(5): 445-454.
[13]PENG L H, TIAN X F, ZHOU L Q. Prediction of drug-target interactions based on consistency learning. Journal of Hu’nan University of Technology, 2020, 34(6): 27-33.
[14]KINGMA D P, WELLING M. Auto-encoder variational bayes. [2021-05-04]. http://arxiv.org/pdf/1312.6114.pdf.1-6.2014.
[15]IM D J, AHN S, MEMISEVIC R, et al. Denoising criterion for variational auto-encoding framework//The Thirty First AAAI Confrerence n Artificial Intelligence, Feb.4-19, 2017, San Francisco, Califonia, USA. Palo Alto: AAAI, 2017: 2059-2061.
[16]YU H W, XIE J, HE L S, et al. Event-related potential recognition method based on convolution neural network and support vector machine. Journal of Xi’an Jiaotong University, 2021(12): 1-9.
[17]LIU Q H, LAI Y P, DING H W, et al. Protein subcellular localization prediction based on SVM. Computer Engineering and Applications, 2019, 55(11): 136-141.
[18]YU L Z, LIU F J, LI D H, et al. A study on recognition of classically and non-classically secreted proteins from cancer cells based on support vector machine. Journal of Sichuan University(Natural Science Edition), 2020, 57(1): 152-156.
[19]TAN S L. Discussion on DNA sequence feature extraction and function prediction technology. Computer Knowledge and Technology, 2016, 12(25): 151-152.
[20]WE Z L, JIAN A F, XUAN X, et al. iDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE, 2017, 6(9): 24756.
[21]CAO D S, LIU S, XU Q S, et al. Largescale prediction of drugtarget interactions using protein sequences and drug topological structures. Analytica Chimca Acta, 2012, 752: 1-10.
[22]FAN Y, XIAO X, MIN J, et al. iNR-Drug: predicting the interaction of drugs with nuclear receptors in cellular networking. International Journal of Molecular Sciences, 2014, 15(3): 4915-4937.
[23]DURANT J L, LELAND B A. HEnry DR. Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences, 2002, 42(6): 1273-1280.
基于降噪变分自编码器的SNP位点与药物关联预测算法
宋晓宇1,2, 冯小蓓1, 朱林1, 刘童1, 吴鸿阳1, 李一凡1
(1. 兰州交通大学 电子与信息工程学院, 甘肃 兰州 730070; 2. 兰州蓝鲸信息技术有限公司, 甘肃 兰州 730070)
摘要:单核苷酸多态性(Single nucletide ploymorphysim, SNP)是研究人类家族和动植物品系遗传变异的重要依据, 被广泛用于群体遗传学研究和疾病相关基因的研究, 在药物基因组学、 诊断学和生物医学中起重要作用。 在药物基因组学研究中, 识别SNP位点与药物之间的关联关系是临床精准用药的关键。 为此, 提出了一种基于降噪变分自编码器和支持向量机(Denoising variational auto-encoder and support vector machine, DVAE-SVM)的SNP位点与药物关联预测模型。 首先, 引入k-mer算法和MACCS(Molecular access system)分子指纹对SNP位点和药物进行数字化表征, 生成初始的特征向量。 随后, 对SNP位点的初始特征向量采用降噪变分自编码器提取有效特征。 最后, 将SNP位点的有效特征向量与药物初始特征向量融合输入到支持向量机中进行关联关系预测。 五倍交叉验证实验结果显示, 该算法预测效果好于随机森林(Random forest, RF)、 逻辑回归(Logistic regression, LR)分类器、 主成分分析(Principal component analysis, PCA)、 去噪自编码器(Denoising auto-encoder, DAE)和变分自编码器(Variational auto-encoder, VAE)特征提取算法的预测结果。
关键词:关联关系预测; k-mer; 分子指纹; 支持向量机; 降噪变分自编码器
引用格式:SONG Xiaoyu, FENG Xiaobei, ZHU Lin, et al. SNP site-drug association prediction algorithm based on denoising variational auto-encoder. Journal of Measurement Science and Instrumentation, 2022, 13(3): 300-308. DOI: 10.3969/j.issn.1674-8042.2022.03.006
[full text view]