Izmir Katip Çelebi University, Graduate School of Natural Sciences

Spring 2021

Resources

WEKA
 

Weka is a framework containing several machine learning algorithms for data mining tasks. The algorithms can be used directly from a GUI application or from code using a Java API. The Weka homepage is here.

Here is a guide for how to use the Weka API from Java code.

---------------------------------------------------------------------------------------------

TensorFlow

TensorFlow is a library for neural networks and, mainly, deep learning developed by Google.

Installations instructions can be found here.

 

Premade estimators in TensorFlow

To make things a bit easier, TensorFlow has some premade estimators (classifiers) that you can use directly without bothering about defining all layers in your model. A guide can be found here.

 

MNIST in TensorFlow

A guide for the MNIST dataset using a linear Softmax classifier can be found here, and a deep ConvNet here.

Note that the deep ConvNet takes a substantial amount of time to train and evaluate, around 10 minutes for 5000 iteration steps on a fairly fast laptop.

-----------------------------------------------------------------------------------------------------------

R Tool

 

R is a tool for statistical computing but can also be used for machine learning tasks. The benefits of R is that it contains very powerful features for visualizing data and classifiers, and it is easy to compare different algorithms for a classification task.

To use R, you need to install the R package and R studio.

There are many good resources for learning how to use machine learning in R:

 

-----------------------------------------------------------------------------------------------------------

Scikit

 

Scikit is a very popular machine learning library for Python. It also contains many features for visualizing data and classifiers.


Installation instructions can be found here.

Here is a good getting started guide to Scikit.

-----------------------------------------------------------------------------------------------------------

KERAS

 

Keras is a high-level API running on top of TensorFlow (and other libraries). Keras is preferred over pure TensorFlow since it is much easier to quickly get something up and running. Keras also contains pre-trained ConvNet models, for example VGG16 and VGG19. Training models on the full ImageNet dataset is extremely time consuming, so getting access to a pre-trained model can be very useful.

 

Pre-trained models in Keras

PyImagesearch has a good guide of how to use pre-trained models for classifying images. Downloading the source code and examples requires that you submit your email adress, but you can download the source from here instead.

The pre-trained models are stored in the ~/.keras/models/ folder. VGG16 and VGG19 are around 500 MB each.

To classify one of the example images using VGG16, run the code with the following parameters:
python classify_image.py --image images/soccer_ball.jpg --model vgg16

 

Diabetes Dataset in Keras

MachineLearningMastery has a good guide for how to build a deep neural network for classifying the Diabetes dataset.

 

MNIST in Keras

EliteDataScience has a good guide of how to build a ConvNet model for the MNIST dataset. Compare it to doing the same thing in pure TensorFlow and you’ll see how easy Keras is to use!

Keras can use Theano or TensorFlow as backend. The guide is for Theano backend, but default in latest versions of Keras is TensorFlow. The only thing you need to change in the code is the input shape, where Theano uses (1,28,28) and TensorFlow uses (28,28,1).

Note that training and evaluating a ConvNet takes a substantial amount of time, around 15 minutes for 10 epochs on a fairly fast laptop. You can try to insert a MaxPool layer after the first ConvNet layer, which reduced the training time with around 75% and results in the same accuracy.

-----------------------------------------------------------------------------------------------------------