Mercer’s Theorem
Mercer’s Theorem determines which functions can be used as a kernel function. In mathematics, specifically functional analysis, Mercer's theorem states that a symmetric, and positive-definite matrix can be represented as a sum of a convergent sequence of product functions. From Mercer’s theorem a matrix is a Gram Matrix if and only if it is positive and semi-definite, i.e. it is an inner product matrix in some space [CST00]. For a function to be a kernel, then the inner product matrix created by a dataset should necessarily be positive-semi-definite. In plain terms, for some function that takes two input vectors as arguments, if we apply to every possible pair of points in our dataset, and write out a corresponding symmetric matrix whose th element corresponds to , then is a valid kernel if and only if that matrix is positive semi-definite.
It should be noted that Mercer’s theorem only tells us when a candidate similarity function is admissible for use in support vector machines. It tells nothing about how good such a function is. Luckily, there is a set of well studied kernel functions that have been shown to work extremely well in practice with SVMs.