Introduction
This is the official website of YMLL, a machine learning library, and PHAKISO,
a Windows software based on YMLL. YMLL contains algorithms that are essential for performing a
Quantitative Structure Pharmacokinetics Relationship (QSPkR) experiment. PHAKISO provides a graphical user interface to the algorithms in YMLL so that a QSPkR
model can be developed and validated easily with just a few mouse
clicks.
YMLL contains different modules which interact with one another to
help develop a QSPkR model. The modules in YMLL are Dataset, DataLoad,
DataSave, DatasetSplit, DatasetCluster, DiversityMetric, Outlier,
Machine, DescriptorFilter, DescriptorSelection, Scale, DistanceMeasurer,
PerformanceMeasurer, Reporter, ObjectiveFunction and Trainer. Each
module defines a standard interface to interact with other modules. The
standardization of a module’s interface enables different algorithms in
the same module to work seamlessly with those in other modules and allow
new algorithms to be easily added. For example, to conduct a simple
QSPkR experiment, we simply link the Dataset, DataLoad, Machine,
PerformanceMeasurer and Reporter modules together. These modules will
load a dataset into memory and pass to a machine learning algorithm to
develop a QSPkR model. The prediction capability of the QSPkR model is
then gauged and reported to the user. The programmer can choose
different algorithms from the three different modules and the different
algorithms are guaranteed to work with one another since they have to
conform to the standard interface that is defined by their module.
Both YMLL and PHAKISO are coded in C++. The source codes are currently not available because of certain proprietary algorithms that were developed by the
BIDD group. However, precompiled
libraries of YMLL for various systems and the
executable for PHAKISO are available freely
on this website for noncommercial uses.
YMLL Features
Machine Learning Methods 
Classification 
 
Multiple linear regression 
 
Logistic regression 
 
Partial least squares 
 
Linear discriminant analysis 
 
C4.5 decision tree 
 
C4.5 decision rules 
 
k nearest neighbour 
 
Feedforward backpropagation neural network (Own implementation,
AnnieNN, TorchMLP) 
 
Probabilistic neural network 
 
Support vector machine (SVMStar,
SVM^{light},
LibSVM,
SVMTorch) 
 
Sphere discriminant (experimental) 
 
Regression 
 
Multiple linear regression 
 
Principal component
regression 
 
Partial least squares 
 
Continuum power
regression 
 
Continuum regression 
 
Feedforward
backpropagation neural network 
 
General regression
neural network 
 
Support vector
regression (SVM^{light},
LibSVM,
SVMTorch) 
Dataset Clustering Algorithms 
Hierarchical 
 
Nearest neighbour 
 
Furthest neighbour 
 
Centroid 
 
Group average 
 
Median 
 
Ward 
 
Flexible 
 
Nonhierarchical 
 
Method proposed by Darko Butina 
 
K means 
 
Flexible K means 
Dataset Outliers Detection Algorithms 
 
Hadi 
 
Iterative R 
 
Iterative Z 
 
Median 
Statistical Molecular Design Algorithms 
 
CADEX 
 
Removaluntildone 
 
Doptimal 
 
Sphere exclusion 
 
Maximum dissimilarity 
 
Every N datum 
 
Random 
Dataset Diversity Measurement Methods 
 
Average nearest
neighbour 
 
Mean intermolecular
dissimilarity 
 
Cumulative property
distribution 
Descriptors Scaling Algorithms 
 
Autoscale 
 
Range scale
(Normalization) (0 to 1, 1 to 1) 
 
Log scale (natural
log, base 10) 
 
Mean scale 
 
Variance scale 
Descriptor Selection Algorithms 
Filter methods 
 
CORCHOP 
 
RELIEFF 
 
Discrimination score 
 
Wrapper methods 
 
Forward selection 
 
Backward elimination 
 
Stepwise regression 
 
Sequential floating
forward selection 
 
Generalized simulated
annealing 
 
Genetic algorithm 
 
Reverse elimination
tabu search 
 
Recursive feature
elimination 
Validation Methods 
 
Training set 
 
Testing set 
 
Leaveoneout 
 
kfold
crossvalidation 
 
Bootstrap 
 
Yrandomization 
Model Performance Measurement Methods 
Classification 
 
Sensitivity 
 
Specificity 
 
Concordance 
 
Matthews correlation
coefficient 
 
Cohen Kappa
coefficient 
 
Error rate 
 
Absolute error rate 
 
Relative error rate 
 
Regression 
 
Correlation
coefficient (r) 
 
Coefficient of
determination (r^{2}) 
 
Adjusted coefficient
of determination (r^{2}_{adj}) 
 
Mean absolute error
(MAE) 
 
Mean square error (MSE) 
 
Root mean square error
(RMSE) 
 
Pearson correlation
coefficient 
 
Pearson r^{2} 
 
Spearman rho 
 
Average fold error 
 
Standard deviation 
 
F ratio 
 
F statistics 
 
Model sum of squares 
 
Residual sum of
squares 
Miscellaneous 
Activation functions for
feedforward backpropagation neural network 
 
Linear 
 
Logistic 
 
Hyperbolic tangent 
 
Gaussian 
 
Sigmoid (0 to 1, 1 to 1) 
 
Logarithm 
 
Data distance/similarity
measurement 
 
Euclidean distance 
 
Manhattan distance 
 
Soergel distance 
 
Gaussian distance 
 
Quadratic distance 
 
Tophat distance 
 
Triangular distance 
 
Tanimoto coefficient 
 
Dice coefficient 
 
Cosine coefficient 
 
Pearson correlation
coefficient 
PHAKISO Features
Standard Features 
 
Measurement of dataset
diversity (screenshot) 
 
Determination of compound clusters in dataset
(screenshot) 
 
Determination of outliers in dataset
(screenshot) 
 
Statistical molecular design
(screenshot) 
 
Yrandomization of dataset 
 
Scaling of descriptors
(screenshot) 
 
Objective descriptor selection
(Filter methods) (screenshot) 
 
Subjective descriptor selection
(Wrapper methods) (screenshot) 
 
Construction of a QSPkR model
(screenshot) 
 
Optimization of parameters for machine learning methods
(screenshot) 
 
Assess prediction capability of QSPkR models on other datasets
(screenshot) 
 
Validation of QSPkR model
(screenshot) 
Additional Features (Not available in YMLL) 
 
Display information on descriptors (mean, standard deviation,
minimum and maximum values, etc) (screenshot) 
 
Automatic filling in of values for descriptors with missing
values (screenshot) 
 
Principal component analysis
(screenshot) 

What's New
 2006 October 12  Bug fix for YMLL
 2006 October 12  Bug fix for PHAKISO
 2006 August 25  Bug fix for YMLL
 2006 August 25  Bug fix for PHAKISO
 2006 April 24  YMLL version 1.0 released
 2006 April 24  PHAKISO version 0.5 released
