Machine Learning in Hindsight

A 20/20 on commands found in the Machine Learning Cookbook

NOTE: the hindsight series are based on old HowTos and are being ported to the more recent versions. Some of this info may be presently out of date until this porting effort is finished.

https://www.tensorflow.org/get_started/mnist/beginners

Software Packages

You’ll find that data scientists tend to use python as their language of choice – presumably because it let’s them focus on modeling the domain without getting too deep into programming constructs unlike a language like Java or C++. Additionally, there are many C/C++ libraries out there optimized for numerical analysis and designed to be invoked from python such as numpy allowing for efficient manipulation of large data sets.

There are 2 different versions of python currently in use, the older python 2.x versions and the newer python 3.x . I’ve been using python 3 in this series as this was recommended by a number of data scientists that I spoke with. Additionally, it was what was used by Stanford’s online CS231N class on convolutional networks.

Python has a package manager, pip and pip3, that is used to download requested packages and install them locally. To deal with version management of the python packages, there is the virtualenv framework. This allows you to create different environments with their own versions of the python libraries.

Another tool that is used to develop python programs in general and especially to experiment with different nets, is the python jupyter notebook application. Jupyter will create a simple web server running in your virtual environment allowing you to execute python code from within your browser. You can use packages such as the mathplotlib package to graph your results.

If you have a supported Nvidia GPU, Nvidia provides a set of libraries, the NVIDIA CUDA® Deep Neural Network library (cuDNN), for performing numerical analysis and machine learning algorithms on their GPUs. You’ll need to install these libs to take advantage of your GPU. Unfortunately, more recent Macs have not been using Nvidia and so the latest tensorflow libs have dropped GPU support for the Mac.

Finally, you’ll want to install numpy, scipy, and pillow in addition to the tensorflow packages as these libs are used quite frequently in machine learning tutorials. Another useful package is the matplotlib. Matplotlib is a python package for creating graphs and charts.

After getting the basic python libraries installed, you can proceed with the tensorflow packages. Remember to stick to the python3 versions.

Important Tensorflow Concepts

First, many of the tensorflow APIs are analogous to the numpy APIs.

Tensorflow concepts

Name	Description
computational graph	In tensorflow, you first specify the operations that you will be performing but you aren’t actually doing the calculations, just describing what will need to happen. You can specify that the output of one operation will be used as the input of another operation and in this way, you form a graph of computations that will occur. Once you have a computational graph, you can push data thru it or back propagate data thru it. Inputs to operational nodes are constants, placeholders, variables or the outputs of other operational nodes.
session	a session provides the environment and resources to actually compute results given your computational graph and some inputs.
tensor	This is the basic data structure used by tensorflow. You can think of this as being an n-dimensional array of floats of various sizes or complex numbers of various sizes. 0-tensors are scalars, 1-tensors (AKA a tensor of rank 1) are vectors or 1 dimensional arrays, 2-tensors (AKA rank 2 tensors) are matrices and so on. A tensor corresponds to numpy’s ndarray
training using optimizers	When building up our computational graph, we can add in nodes that are “variables”. Training is the process that is performed to determine what is the best or approximately the best values for these “variables”. The mechanism that actually determines the values for these variables is the optimizer and a very important type of optimizer is the gradient descent optimizer. Gradient descent works by defining a loss or optimization function for the computational graph. The idea is that this function quantitizes exactly how far off the output of the computation graph is from its desired output. During training, the optimizer will modify the value of the variable nodes, taking the computational graph’s loss function’s gradient and then using that gradient to determine how to change adjust the value of the variable so that the loss function will shrink the most.
estimator	The estimator will use the most common algortithms to train your network for you. Unless you are a researcher testing out new algorithms, you should probably stick to using the estimator for your applications.

model	This is the data structures, computational graphs and loss functions used by your application. Tensorflow provides some standard models here.

References

See Tensorflow Reference
Machine Learning Cookbook
An Introduction To Statistical Learning with Applications in R (ISLR Sixth Printing)
Bayesian Reasoning and Machine Learning
Elements of Statistical Learning
see linear algebra notes.
see numpy notes