What is the difference between correlation and linear regression? - FAQ - GraphPad
(dependent variable) on one or more variables (explanatory variables) with a view Statistical versus deterministic Relationship. Statistical. Concerned with. In statistical relationships among variables we essentially deal with say, in the k of Newton's law of gravity, the otherwise deterministic. Examples of statistics vs. deterministic relationships & chaos models. A deterministic relationship involves an exact relationship between two.
In statistical relationships among variables we essentially deal with random or stochastic variables.
These variables have probability distributions. A statistical relationship per se cannot logically imply causation. The correlation coefficient measures this strength of linear association In regression analysis we try to estimate the average value of one variable on the basis of the fixed values of other variables.
Types of Relationships | STAT /
There is no distinction between variables. Both variables are considered random. Most of the regression theory is based on the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic. In a multiple regression analysis we study the dependence of one variable on more than one explanatory variable, such as that of money demand on interest rates, income, and inflation.
A random stochastic variable is a variable that can take on any set of values, positive or negative, with a given probability. Time series data Cross sectional data Pooled data 20 Time series data A time series is a set of observations on the values that a variable takes at different times. International agencies World Bank Surveys In the social sciences the data that one generally obtains are nonexperimental in nature, that is, not subject to the control of the researcher.
Types of Relationships
Possibility of observational errors. What kind of data? Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate.
Linear regression is usually used when X is a variable you manipulate time, concentration, etc. Does it matter which variable is X and which is Y? With correlation, you don't have to think about cause and effect.
It doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. The decision of which variable you call "X" and which you call "Y" matters in regression, as you'll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y however both those lines have the same value for R2 Assumptions The correlation coefficient itself is simply a way to describe how two variables vary together, so it can be computed and interpreted for any two variables.
Further inferences, however, require an additional assumption -- that both X and Y are measured, and both are sampled from Gaussian distributions.
This is called a bivariate Gaussian distribution.Statistics 101: Understanding Correlation
If those assumptions are true, then you can interpret the confidence interval of r and the P value testing the null hypothesis that there really is no correlation between the two variables and any correlation you observed is a consequence of random sampling. With linear regression, the X values can be measured or can be a variable controlled by the experimenter.
The X values are not assumed to be sampled from a Gaussian distribution.