You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F. Unpaywalled article links Add open access links from to the list of external document links if available.
|Published (Last):||25 October 2018|
|PDF File Size:||17.6 Mb|
|ePub File Size:||3.35 Mb|
|Price:||Free* [*Free Regsitration Required]|
The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA.
This tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as well as the when, the how and the why of applying this technique.
With minimal effort PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simpli? The goal of this tutorial is to provide both an intuitive feel for PCA, and a thorough discussion of this topic. We will begin with a simple example and provide an intuitive explanation of the goal of PCA.
We will continue by adding mathematical rigor to place it within the framework of linear algebra to provide an explicit solution. This understanding will lead us to a prescription for how to apply PCA in the real world and an appreciation for the underlying assumptions. My hope is that a thorough understanding of PCA provides a foundation for approaching the? The discussion and explanations in this paper are informal in the spirit of a tutorial.
The goal of this paper is to educate. Occasionally, rigorous mathematical proofs are necessary although relegated to the Appendix. Although not as vital to the tutorial, the proofs are presented for the adventurous reader who desires a more complete understanding of the math.
My only assumption is that the reader has a working knowledge of linear algebra. My goal is to provide a thorough discussion by largely building on ideas from linear algebra and avoiding challenging topics in statistics and optimization theory but see Discussion.
Please feel free to contact me with any suggestions, corrections or comments. Here is the perspective: we are an experimenter. We are trying to understand some phenomenon by measuring various quantities e. Unfortunately, we can not? This is not a trivial problem, but rather a fundamental obstacle in empirical science. Examples abound from complex systems such as neuroscience, web indexing, meteorology and oceanography - the number of variables to measure can be unwieldy and at times even deceptive, because the underlying relationships can often be quite simple.
Take for example a simple toy problem from physics diagrammed in Figure 1. This system consists of a ball of mass m attached to a massless, frictionless spring. The ball is released a small distance away from equilibrium i. Because the spring is ideal, it oscillates inde? This is a standard problem in physics in which the motion along the x direction is solved by an explicit function of time.
In other words, the underlying dynamics can be expressed as a function of a single variable x. However, being ignorant experimenters we do not know any of this. We do not know which, let alone how many, axes and dimensions are important to measure.
At Hz each movie camera records an image indicating a two dimensional position of the ball a projection. Unfortunately, because of our ignorance, we do not even know what are the real x, y and z axes, so we choose three camera positions a, b and c at some arbitrary angles with respect to the system. The angles between our measurements might not even be 90o! Now, we? Electronic address: shlens salk. We treat every time sample or experimental trial as an individual sample in our data set.
At each time sample we record a set of data consisting of multiple measurements e. In our data set, at one point in time, camera A records a corresponding ball position xA , yA. One sample or trial can then be expressed as a 6 dimensional column vector?
With this concrete example, let us recast this problem in abstract terms. Each sample X is an m-dimensional vector, where m is the number of measurement types. Equivalently, every sample is a vector that lies in an m-dimensional vector space spanned by some orthonormal basis. From linear algebra we know that all measurement vectors form a linear combination of this set of unit length basis vectors.
What is this orthonormal basis? This question is usually a tacit assumption often overlooked. Pretend we gathered our toy example data above, but only looked at camera A. What is an orthonormal basis for xA , yA? The reason is that the naive basis re? Rather, we recorded the position 2, 2 on our camera meaning 2 units up and 2 units to the left in our camera window.
Thus our original basis re? How do we express this naive basis in linear algebra? The position of a ball attached to an oscillating spring is recorded using three cameras A, B and C. The position of the ball tracked by each camera is depicted in each panel below. The big question remains: how do we get from this data set to a simple equation of x? We know a-priori that if we were smart experimenters, we would have just measured the position along the x-axis with one camera. But this is not what happens in the real world.
We often do not know which measurements best re? Furthermore, we sometimes record more dimensions than we actually need. Also, we have to deal with that pesky, real-world problem of noise. In the toy example this means that we need to deal with air, imperfect cameras or even friction in a less-than-ideal spring.
Noise contaminates our data set only serving to obfuscate the dynamics further. This toy example is the challenge experimenters face everyday. Keep this example in mind as we delve further into abstract concepts.
Hopefully, by the end of this paper we will have a good understanding of how to systematically extract x using principal component analysis. The hope is that this new basis will? We can consider our naive basis as the effective starting point. In other words, the jth coef? Therefore, the rows of P are a new set of basis vectors for representing of columns of X.
Change of Basis With this rigor we may now state more precisely what PCA asks: Is there another basis, which is a linear combination of the original basis, that best re-expresses our data set? A close reader might have noticed the conspicuous addition of the word linear. Indeed, PCA makes one stringent but powerful assumption: linearity. Linearity vastly simpli?
With this assumption PCA is now limited to re-expressing the data as a linear combination of its basis vectors. Let X be the original data set, where each column is a single sample or moment in time of our data set i. X is the original recorded data set and Y is a new representation of that data set.
Equation 1 represents a change of basis and thus can have many interpretations. P is a matrix that transforms X into Y. Questions Remaining By assuming linearity the problem reduces to? Several questions now arise. What is the best way to re-express X? What is a good choice of basis P? These questions must be answered by next asking ourselves what features we would like Y to exhibit.
Evidently, additional assumptions beyond linearity are required to arrive at a reasonable result. The selection of these assumptions is the subject of the next section. Geometrically, P is a rotation and a stretch which again transforms X into Y. The latter interpretation is not obvious but can be seen by writ- Now comes the most important question: what does best express the data mean? This section will build up an intuitive answer to this question and along the way tack on additional assumptions.
Noise and Rotation 1 In this section xi and yi are column vectors, but be forewarned. In all other sections xi and yi are row vectors. Note that the largest direction of variance does not lie along the basis of the recording xA , yA but rather along the best-?
The two measurements on the left are uncorrelated because one can not predict one from the other. Conversely, the two measurements on the right are highly correlated indicating highly redundant measurements. There exists no absolute scale for noise but rather all noise is quanti? Remembering that the spring travels in a straight line, every individual camera should record motion in a straight line as well.
Therefore, any spread deviating from straight-line motion is noise. The variance due to the signal and noise are indicated by each line in the diagram. By positing reasonably good measurements, quantitatively we assume that directions with largest variances in our measurement space contain the dynamics of interest.
In Figure 2 the direction with the largest variance is not x?
"A Tutorial on Principal Component Analysis."
Donate to arXiv