minimizing the negative-log-likelihood loss function


Section 19.6.5 noted that the output of the logistic function could be interpreted as a probability p assigned by the model to the proposition that f(x)=1; the probability that f(x)=0 is therefore 1 – p. Write down the probability p as a function of x and calculate the derivative of log p with respect to each weight wi. Repeat the process for log(1-p). These calculations give a learning rule for minimizing the negative-log-likelihood loss function for a probabilistic hypothesis. Comment on any resemblance to other learning rules in the chapter.