Introduction
This is the official website of YMLL, a machine learning library, and PHAKISO,
a Windows software based on YMLL. YMLL contains algorithms that are essential for performing a
Quantitative Structure Pharmacokinetics Relationship (QSPkR) experiment. PHAKISO provides a graphical user interface to the algorithms in YMLL so that a QSPkR
model can be developed and validated easily with just a few mouse
clicks.
YMLL contains different modules which interact with one another to
help develop a QSPkR model. The modules in YMLL are Dataset, DataLoad,
DataSave, DatasetSplit, DatasetCluster, DiversityMetric, Outlier,
Machine, DescriptorFilter, DescriptorSelection, Scale, DistanceMeasurer,
PerformanceMeasurer, Reporter, ObjectiveFunction and Trainer. Each
module defines a standard interface to interact with other modules. The
standardization of a module’s interface enables different algorithms in
the same module to work seamlessly with those in other modules and allow
new algorithms to be easily added. For example, to conduct a simple
QSPkR experiment, we simply link the Dataset, DataLoad, Machine,
PerformanceMeasurer and Reporter modules together. These modules will
load a dataset into memory and pass to a machine learning algorithm to
develop a QSPkR model. The prediction capability of the QSPkR model is
then gauged and reported to the user. The programmer can choose
different algorithms from the three different modules and the different
algorithms are guaranteed to work with one another since they have to
conform to the standard interface that is defined by their module.
Both YMLL and PHAKISO are coded in C++. The source codes are currently not available because of certain proprietary algorithms that were developed by the
BIDD group. However, precompiled
libraries of YMLL for various systems and the
executable for PHAKISO are available freely
on this website for non-commercial uses.
YMLL Features
|
Machine Learning Methods |
|
Classification |
| - |
Multiple linear regression |
| - |
Logistic regression |
| - |
Partial least squares |
| - |
Linear discriminant analysis |
| - |
C4.5 decision tree |
| - |
C4.5 decision rules |
| - |
k nearest neighbour |
| - |
Feedforward backpropagation neural network (Own implementation,
AnnieNN, TorchMLP) |
| - |
Probabilistic neural network |
| - |
Support vector machine (SVMStar,
SVMlight,
LibSVM,
SVMTorch) |
| - |
Sphere discriminant (experimental) |
| | |
|
Regression |
| - |
Multiple linear regression |
| - |
Principal component
regression |
| - |
Partial least squares |
| - |
Continuum power
regression |
| - |
Continuum regression |
| - |
Feedforward
backpropagation neural network |
| - |
General regression
neural network |
| - |
Support vector
regression (SVMlight,
LibSVM,
SVMTorch) |
|
Dataset Clustering Algorithms |
|
Hierarchical |
| - |
Nearest neighbour |
| - |
Furthest neighbour |
| - |
Centroid |
| - |
Group average |
| - |
Median |
| - |
Ward |
| - |
Flexible |
| | |
|
Non-hierarchical |
| - |
Method proposed by Darko Butina |
| - |
K means |
| - |
Flexible K means |
|
Dataset Outliers Detection Algorithms |
| - |
Hadi |
| - |
Iterative R |
| - |
Iterative Z |
| - |
Median |
|
Statistical Molecular Design Algorithms |
| - |
CADEX |
| - |
Removal-until-done |
| - |
D-optimal |
| - |
Sphere exclusion |
| - |
Maximum dissimilarity |
| - |
Every N datum |
| - |
Random |
|
Dataset Diversity Measurement Methods |
| - |
Average nearest
neighbour |
| - |
Mean intermolecular
dissimilarity |
| - |
Cumulative property
distribution |
|
Descriptors Scaling Algorithms |
| - |
Autoscale |
| - |
Range scale
(Normalization) (0 to 1, -1 to 1) |
| - |
Log scale (natural
log, base 10) |
| - |
Mean scale |
| - |
Variance scale |
|
Descriptor Selection Algorithms |
|
Filter methods |
| - |
CORCHOP |
| - |
RELIEFF |
| - |
Discrimination score |
| | |
|
Wrapper methods |
| - |
Forward selection |
| - |
Backward elimination |
| - |
Stepwise regression |
| - |
Sequential floating
forward selection |
| - |
Generalized simulated
annealing |
| - |
Genetic algorithm |
| - |
Reverse elimination
tabu search |
| - |
Recursive feature
elimination |
|
Validation Methods |
| - |
Training set |
| - |
Testing set |
| - |
Leave-one-out |
| - |
k-fold
cross-validation |
| - |
Bootstrap |
| - |
Y-randomization |
|
Model Performance Measurement Methods |
|
Classification |
| - |
Sensitivity |
| - |
Specificity |
| - |
Concordance |
| - |
Matthews correlation
coefficient |
| - |
Cohen Kappa
coefficient |
| - |
Error rate |
| - |
Absolute error rate |
| - |
Relative error rate |
| | |
|
Regression |
| - |
Correlation
coefficient (r) |
| - |
Coefficient of
determination (r2) |
| - |
Adjusted coefficient
of determination (r2adj) |
| - |
Mean absolute error
(MAE) |
| - |
Mean square error (MSE) |
| - |
Root mean square error
(RMSE) |
| - |
Pearson correlation
coefficient |
| - |
Pearson r2 |
| - |
Spearman rho |
| - |
Average fold error |
| - |
Standard deviation |
| - |
F ratio |
| - |
F statistics |
| - |
Model sum of squares |
| - |
Residual sum of
squares |
|
Miscellaneous |
|
Activation functions for
feedforward backpropagation neural network |
| - |
Linear |
| - |
Logistic |
| - |
Hyperbolic tangent |
| - |
Gaussian |
| - |
Sigmoid (0 to 1, -1 to 1) |
| - |
Logarithm |
| | |
|
Data distance/similarity
measurement |
| - |
Euclidean distance |
| - |
Manhattan distance |
| - |
Soergel distance |
| - |
Gaussian distance |
| - |
Quadratic distance |
| - |
Tophat distance |
| - |
Triangular distance |
| - |
Tanimoto coefficient |
| - |
Dice coefficient |
| - |
Cosine coefficient |
| - |
Pearson correlation
coefficient |
PHAKISO Features
|
Standard Features |
| - |
Measurement of dataset
diversity (screenshot) |
| - |
Determination of compound clusters in dataset
(screenshot) |
| - |
Determination of outliers in dataset
(screenshot) |
| - |
Statistical molecular design
(screenshot) |
| - |
Y-randomization of dataset |
| - |
Scaling of descriptors
(screenshot) |
| - |
Objective descriptor selection
(Filter methods) (screenshot) |
| - |
Subjective descriptor selection
(Wrapper methods) (screenshot) |
| - |
Construction of a QSPkR model
(screenshot) |
| - |
Optimization of parameters for machine learning methods
(screenshot) |
| - |
Assess prediction capability of QSPkR models on other datasets
(screenshot) |
| - |
Validation of QSPkR model
(screenshot) |
|
Additional Features (Not available in YMLL) |
| - |
Display information on descriptors (mean, standard deviation,
minimum and maximum values, etc) (screenshot) |
| - |
Automatic filling in of values for descriptors with missing
values (screenshot) |
| - |
Principal component analysis
(screenshot) |
|
What's New
- 2006 October 12 - Bug fix for YMLL
- 2006 October 12 - Bug fix for PHAKISO
- 2006 August 25 - Bug fix for YMLL
- 2006 August 25 - Bug fix for PHAKISO
- 2006 April 24 - YMLL version 1.0 released
- 2006 April 24 - PHAKISO version 0.5 released
|