Review of "Scaling Distributed Machine Learning with the Parameter Server"

01 Nov 2015

Review of "Scaling Distributed Machine Learning with the Parameter Server"

As more and more large scale optimization and inference problems come into being. It becomes necessary to use distributed framework to solve these machine learning problems. This paper proposes Parameter Server framework to solve these problems.

There are several features provided by this framework.

  1. All communications are asynchronous unless requested otherwise. And are optimized for machine learning tasks to reduce network traffic and overhead.

  2. It provides a flexible consistency model that can allow some level of relaxation to better balance convergence rate and system efficiency.

  3. It provides elastic scalability in which new nodes can be added without restarting the running framework.

  4. It provides fault tolerance and durability.

  5. Globally shared parameters are represented as vectors and matrices to facilitate machine learning applications. By treating the parameters as sparse linear algebra objects, the parameter server can provide the same functionality as (key, value) abstraction, but also admits important optimized operations such as vector addition, multiplication, 2-norm, and other more sophisticated operations.

A parameter server instance can run more than one algorithm simultaneously. Parameter server nodes are grouped into a server group and several worker groups. Server nodes communicate with each other to replicate and/or to migrate parameters for reliability and scaling. Each worker group runs an application. A worker typically stores locally a portion of the training data to compute local statistics such as gradients. Worker nodes only communicate with server nodes, pushing and pulling parameters. A scheduler node is used for each worker group. It assigns tasks to workers and monitors their progress.

Will this paper be influential in 10 years? Not sure. On one hand it provides a way to scale machine learning computations, but on the other hand, it doesn't provide a flexible enough programming model for building different machine learning applications. It would be great if there are APIs for Java and Python too.