# Regression

There are various techniques for analysing the relationship between dependent and independent variables
These techniques help you understand how the dependent variable changes when one of the independent variables change (while the other independent variables remain the same)

Regression analysis estimates the conditional expectation (ie average value) of the dependent variable, given the independent variables
The estimation target is a function of the independent variables called the "regression function"

It is also common to characterise the variation of the dependent variable around the regression function which can be described by a probability distribution
Regression analysis is widely used for prediction and forecasting

In some circumstances, regression analysis can be used to infer "casual" relationships between the dependent and independent variables

### Parametric Regresson

The regression function is defined in terms of a finite number of unknown parameters that are estimated from the data

linear regression
oridinary least squares regression

### Non Parametric Regression

The regression function can lie in a specified set of functions which may be infinite in dimension

### Extrapolation

SLOPE - The slope of the linear regression. This is a type of bivariate descriptive function.
RSQ - The square of the Pearson product moment correlation coefficient (ie coefficient of determination)
LINEST - used to project an exponential curve
INTERCEPT
STEYX
LOGEST - used to project a straight line
TREND -
FORECAST -

### What is Linear Regression ?

The technique of linear regression attempts to define the relationship between the dependent and independent variables by means of a linear equation.
This is the simplest form of equation between two variables.
Produces the slope of a line that best fits a single set of data.

### How good is your line ?

You could find the difference between each point and the line.
These differences are often referred to as errors.
The errors above the line are positive and the errors below the line are negative.
If you add these errors together you will find that they total zero.

This doesn't prove that the line is a good one though.
The absolute values of the errors could be added up but the line of best fit is obtained when the sum of the squares of the errors is as small as possible.
Squaring the errors not only removes the sign but also gives more emphasis to the larger errors.

### Method of Least Squares

Linear regrssion involves finding the line that minimises the sum of the squares of the errors.
You must always make sure that the y-variable is the dependent variable in the equation:
y = a + bx

The values of a and b that minise the squared errors is given by the following equation:

b = ( nSxy - SxSy ) / ( nSy2 - (Sx)2 )
this is the slope
a = (Sy / n) - b(Sx/n)

this is the intercept

### Coefficient of Determination

Before a regression equation can be used effectively you need to see how well it fits the data.
One statistic that can be used as an indication is the coefficient of determination.
This measures the proportion of the variation in the independent variable.
This is given by r2 which is the square of the Correlation > Pearsons Coefficient

### Exponential Regression

Produces an exponential curve that best fits a set of data that does not change linearly with time
For example a series of population growth will always be better represented with an exponential curve

### Multiple Regression

This is the analysis of more than one set of data.
You can perform both linear and exponential multiple regression analysis
For example if you want to project house prices in your area based on square footage, number of bathrooms and age this could be done using a multiple regression formula