Matrix Operations in Machine Learning

One of the interesting things to play around machine learning is to optimize the computational cost in training the learning model through matrix operations.

Taking a 3 layers neuron network as an example, it has one input layer, one hidden layer and one output layer, it is a typical model being used for recognizing handwritten numbers as shown below.



Without using matrix operation, one can loop through K training samples to compute the output in layer 2 and 3, and finally get the summation of the total cost J. This code is more concise and easy to understand.

J = 0; % J is the cost of the network
for i = 1:K % K is the number of training samples
  % layer 1
  a1 = X(i,:); % X is the input features 

  % layer 2
  z2 = Theta1 * a1'; % Theta1
  a2 = sigmoid(z2); % sigmoid is logistic function

  % layer 3
  z3 = Theta2 * [1; a2]; % Theta2 is 
  a3 = sigmoid(z3);

  % yy is the output
  J += -yy(i,:)*log(a3)-(1-yy(i,:))*log(1-a3);
end
J = J / K;

If we implemented it with matrix operation without looping through K training sample, we can leverage the code optimization through matrix operation and eliminate the for loop. I got 2.x times faster on my machine. (without seriously benchmarking the performance)

all_z2 = Theta1 * X';
all_a2 = sigmoid(all_z2);
all_a2 = [ones(1, K); all_a2];

all_z3 = Theta2 * all_a2;
all_a3 = sigmoid(all_z3);

J = 0;
for i = 1:K
  J += -yy(i,:)*log(all_a3(:,i))-(1-yy(i,:))*log(1-all_a3(:,i));
end
J = J / K;

However, if we blindly to use matrix operations as shown below, it introduces 2*K*K - K redundant operations, making it even runs slower than the first implementation above.

all_a31 = -yy * log(all_a3) .* eye(K);
all_a32 = (1 - yy) * log(1 - all_a3) .* eye(K);
all = all_a31 - all_a32;
J = sum(sum(all));

Comments