Safe Haskell | None |
---|---|
Language | Haskell98 |
Statistics.LinearRegression
Synopsis
- linearRegression :: Vector v Double => v Double -> v Double -> (Double, Double)
- linearRegressionRSqr :: Vector v Double => v Double -> v Double -> (Double, Double, Double)
- linearRegressionTLS :: Vector v Double => v Double -> v Double -> (Double, Double)
- correl :: Vector v Double => v Double -> v Double -> Double
- covar :: Vector v Double => v Double -> v Double -> Double
- linearRegressionMSE :: (Vector v Double, Vector v (Double, Double)) => (Double, Double) -> v Double -> v Double -> Double
- linearRegressionDistributions :: (Vector v Double, Vector v (Double, Double)) => (Double, Double) -> v Double -> v Double -> (LinearTransform StudentT, LinearTransform StudentT)
- robustFit :: (MonadRandom m, Vector v Double) => EstimationParameters -> v Double -> v Double -> m EstimatedRelation
- nonRandomRobustFit :: Vector v Double => EstimationParameters -> v Double -> v Double -> EstimatedRelation
- robustFitRSqr :: (MonadRandom m, Vector v Double, Vector v (Double, Double)) => EstimationParameters -> v Double -> v Double -> m (EstimatedRelation, Double)
- data EstimationParameters = EstimationParameters {
- outlierFraction :: !Double
- shortIterationSteps :: !Int
- maxSubsetsNum :: !Int
- groupSubsets :: !Int
- mediumSetSize :: !Int
- largeSetSize :: !Int
- estimator :: Estimator
- errorFunction :: ErrorFunction
- type ErrorFunction = EstimatedRelation -> (Double, Double) -> Double
- type Estimator = Sample -> Sample -> EstimatedRelation
- type EstimatedRelation = (Double, Double)
- defaultEstimationParameters :: EstimationParameters
- linearRegressionError :: ErrorFunction
- linearRegressionTLSError :: ErrorFunction
- converge :: Vector v Double => EstimationParameters -> v Double -> v Double -> EstimatedRelation -> EstimatedRelation
Simple linear regression functions
linearRegression :: Vector v Double => v Double -> v Double -> (Double, Double) Source #
Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta) such that Y = alpha + beta*X
linearRegressionRSqr :: Vector v Double => v Double -> v Double -> (Double, Double, Double) Source #
Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta, r*r) such that Y = alpha + beta*X and where r is the Pearson product-moment correlation coefficient
linearRegressionTLS :: Vector v Double => v Double -> v Double -> (Double, Double) Source #
Total Least Squares (TLS) linear regression.
Assumes x-axis values (and not just y-axis values) are random variables and that both variables have similar distributions.
interface is the same as linearRegression
.
Related functions
correl :: Vector v Double => v Double -> v Double -> Double Source #
Pearson's product-moment correlation coefficient
Estimated errors and distribution parameters
linearRegressionMSE :: (Vector v Double, Vector v (Double, Double)) => (Double, Double) -> v Double -> v Double -> Double Source #
The error (or residual) mean square of a sample w.r.t. an estimated regression line. This serves as an estimate for the variance of the sampled data. Accepts the regression parameters (alpha,beta) and the sample vectors X and Y.
linearRegressionDistributions :: (Vector v Double, Vector v (Double, Double)) => (Double, Double) -> v Double -> v Double -> (LinearTransform StudentT, LinearTransform StudentT) Source #
The estimated distributions of the regression parameters (alpha and beta) assuming normal, identical distributions of Y, the sampled data. These can serve to get confidence intervals for the regression parameters. Accepts the regression parameters (alpha,beta) and the sample vectors X and Y. The distributions are StudnetT distributions centered at the estimated (alpha,beta) respectively, with parameter numbers n-2 (where n is the initial sample size) and with standard deviations that are extracted from the sampled data based on its MSE. See chapter 2 of reference [3] for details.
Robust linear regression
robustFit :: (MonadRandom m, Vector v Double) => EstimationParameters -> v Double -> v Double -> m EstimatedRelation Source #
Finding a robust fit linear estimate between two samples. The procedure requires randomization and is based on the procedure described in the reference.
nonRandomRobustFit :: Vector v Double => EstimationParameters -> v Double -> v Double -> EstimatedRelation Source #
A wrapper that executes robustFit
using a default random generator (meaning it is only pseudo-random)
robustFitRSqr :: (MonadRandom m, Vector v Double, Vector v (Double, Double)) => EstimationParameters -> v Double -> v Double -> m (EstimatedRelation, Double) Source #
Robust fit yielding also the R-square value of the "clean" dataset.
Related types
data EstimationParameters Source #
The robust fit algorithm used has various parameters that can be specified using the EstimationParameters
record.
Constructors
EstimationParameters | |
Fields
|
type ErrorFunction = EstimatedRelation -> (Double, Double) -> Double Source #
An ErrorFunction
is a function that computes the error of a given point from an estimate. This module provides two error functions correspoinding to the two Estimator
functions it defines:
- Vertical distance squared via
linearRegressionError
that should be used withlinearRegression
- Total distance squared vie
linearRegressionTLSError
that should be used withlinearRegressionTLS
type Estimator = Sample -> Sample -> EstimatedRelation Source #
An Estimator
is a function that generates an estimated linear regression based on 2 samples. This module provides two estimator functions:
linearRegression
and linearRegressionTLS
type EstimatedRelation = (Double, Double) Source #
An estimated linear relation between 2 samples is (alpha,beta) such that Y = alpha + beta*X.
Provided values
defaultEstimationParameters :: EstimationParameters Source #
Default set of parameters to use (see reference for details).
linearRegressionError :: ErrorFunction Source #
linearRegression error function is the square of the vertical distance of a point from the line.
linearRegressionTLSError :: ErrorFunction Source #
linearRegressionTLS error function is the square of the total distance of a point from the line.
Helper functions
converge :: Vector v Double => EstimationParameters -> v Double -> v Double -> EstimatedRelation -> EstimatedRelation Source #
Calculate the optimal (local minimum) estimate based on an initial estimate. The local minimum may not be the global (a.k.a. best) estimate but starting from enough different initial estimates should yield the global optimum eventually.
References
- Two Dimensional Euclidean Regression (Stein) http://www.dspcsp.com/pubs/euclreg.pdf
- Computing LTS Regression For Large Data Sets (Rousseeuw and Driessen) http://agoras.ua.ac.be/abstract/Comlts99.htm
- Applied linear statistical models (Kutner et al.)