In this study, data mining approach was used to derive decision rules for predicting average flexibility from the various derived sequence and structural features. 21 parameters were calculated and variable importance was calculated for 101 sequences of CaMK kinase family belonging to mouse and human using Classification and Regression Tree (CART). Coils were found to have maximum influence on average flexibility while the Parallel beta strands were found to exert minimum impact on average flexibility. Understanding the variable importance will prove useful as a simple predictor of flexibility from an amino acid sequence. This will aid in better understanding of phenomenon underlying the average flexibility and thus, will pave a way for rational design of therapeutics and development of proper parametric weight distribution for existing molecular dynamics and protein folding algorithms.
Electronic Journal of Biology received 4232 citations as per google scholar report