From ControlTheoryPro.com

Contents
1 Introduction to Online Recursive Least Squares
The following online recursive least squares derivation comes from class notes provided for Dr. Shieh's ECE 7334 Advanced Digital Control Systems at the University of Houston. Similar derivations are presented in [^{[1]}, ^{[2]} and ^{[3]}].
If you wish to skip directly to the update equations click here.
2 Derivation of Online Recursive Least Squares^{[4]}
The derivation for the online recursive least squares (RLS) method is as follows
'Actual System Outputs' 
'System Inputs' 
'System Model Coefficients' 
and
'Noise or Error' 
The data set
leads to the optimal parameter equation
Note this is just where, by definition of optimality, and where
 is the weighting matrix, and
 is replaced by the pseudoinverse.
This is a recusive algorithm which means that data points must be added sequentially. In order to add a new data point use
where
Not quite right 
is the new data set and is the old set of model coefficients. The updated weighting matrix is
Use
Eqn. 1 
to find the optimal model coefficients.
To find the relationship between the old coefficients and the new use
Where is defined as
Eqn. 2 
is a symmetric matrix > 0 and the updated value is
Eqn. 3 
Eqn. 3 leads to
and Eqn. 2 to
and finally to
Combine Eqns. 1 and 3 to obtain the equation
which leads to the new model coefficients in terms of P (all the previous input and output data vectors) and the current data point. The equation for the new model coefficients can be rearranged into this form
where
 is the previous model coefficients and
 is the error between actual system output and predicted system output.
where
is the discrete Kalman gain matrix.
The online recursive least squares algorithm derived to this point is slow. Therefore inappropriate for an online algorithm. Specifically, matrix inversions should be avoided. The matrix inverse can be avoided through the use of the inverse lemma
Inverse Lemma 
where and . The inverse lemma leads to the following equations
where
and finally
The variable i represents the iteration number and the innovation sequence is
While the resulting algorithm avoids any matrix inverses it is not suitable for rapid timevarying systems. This is because the algorithm, after lots of samples, will stop updating at steady state. The algorithm cannot update quickly because large amountss of data are already incorporated. In other words, after 10,000 data points a new data point is effectively scaled by 1/10,001 vs. the combined effect of the prior points (10,000/10,001).
To improve the algorithm's performance a forgetting factor is added to
which leads to
where
 is the forgetting factor and (1) represents a relatively large forgetting factor.
If all are replaced with and with then the final form of the alorithm is achieved. The equations are
where
with initial conditions
and
where
 is a good starting value and
 is the identity matrix of size (p x p).
3 Final Online Recursive Least Squares Equation Set
These are the final online recursive least squares update equations in order.
with initial conditions
and
where
 is a good starting value and
 is the identity matrix of size (p x p).
4 Notes of caution about Least Squares based techniques
Least squares techniques calculate coefficients for a polynomial approximation. All polynomials have tails  that is the ends of a polynomial trend toward + or  infinity. Therefore a polynomial fit can be good for interpolation but not for extrapolation. When dealing with measured data it is often difficult to determine the full breadth of the input data without extensive analysis. Often there will be lots of data towards the upper or lower bounds but there may be very little data in between. System behavior can lead to transitions that provide only handful of data points. When comparing these few points to the rest of the data the transition points can become lost in the rest of the data. The weighting of the coefficients will tend to mirror the weighting of the number of data points in particular regions. With multidimensional input data visualizing the weighting behavior is difficult.
4.1 Rich Excitation
Least squares methods are polynomial fits of input parameters to output parameters. This requires that the input to output relationship is linear. It also means that for output estimation to be accurate that the test inputs must fall within the range of inputs used to create the polynomial coefficients. In other words, a polynomial of arbitrary order is constructed fitting input data to output data. The input data used has some range (use [1 1] for example). If this polynomial is used to estimate an output due to an input outside of that range (for example, 2) then the estimate will likely be inaccurate.
In 2 dimensions the reason for this is simple. All polynomials have tails that go to infinity. The polynomial created with least squares is usually stable (and reasonably accurate) within the range of input values used in its creation. Least squares polynomials are good for interpolation but become unstable when used for extrapolation.
The method described if for use in Control systems. The transfer function derived with least squares is then used to design the system controller. If the transfer function is inaccurate for some portion of available inputs the system can go unstable.
In short, the concept of Rich Excitation is to 'ping' the black box system with all possible inputs. The output of these inputs provides a range of all possible outputs. Thus the least squares fit can work with the entire range of applicable values and all output estimates are interpolations rather than extrapolations.
The entire range of applicable input values must be considered carefully. It must include inputs across the entire magnitude range as well as inputs of all possible frequencies. Careful consideration of the input signal is necessary in order to obtain the proper input/output data set.
In practice the standard solution is to use white noise as the input. White noise contains all frequencies at equal magnitude. Thus, given enough inputs samples, the system is tested across all physically possible inputs. Rich Excitation in the case of many real systems is not possible. This limits the ability of least squares methods to accurately predict the output.
5 See Also
6 References
 Broussard, K. J. and Trahan, R. E. Jr., "Automatic Control System Failure Detection via Parameter Identification Techniques," IEEE Proceedings of Southeastcon, April 1991, pp. 176180.
 Hsia, T. C., "System Identification (LeastSquares Methods)," 1977 D. C. Heath and Company, Lexington, MA. ISBN 0669996300
 Ljung. L. and Soderstrom, T., "Theory and Practice of Recursive Identification," 2^{nd} Ed. 1985 MIT Press, Cambridge, MA. ISBN 0262620588
 Spradlin, Gabriel T. '"AN EXPLORATION OF PARAMETER IDENTIFICATION TECHNIQUES: CMG TEMPERATURE PREDICTION THEORY AND RESULTS." Master's Thesis, University of Houston, Houston, TX December 2005.