From ControlTheoryPro.com

Contents
1 Introduction to Offline Least Squares
The following offline least squares derivation comes from class notes provided for Dr. Shieh's ECE 7334 Advanced Digital Control Systems at the University of Houston. Similar derivations are presented in [^{[1]} and ^{[2]}]. Initially the offline least squares technique presented is for a SingleInput SingleOutput (SISO) system. The technique is for determining a transfer function of (nearly) arbitrary order. (The number of available data points ultimately limits the maximum order of the transfer function.)
After the SISO offline least squares method is presented, an extension to MIMO offline least squares is presented.
2 Derivation of Offline Least Squares^{[3]}
The minimum number of data points required is
where
 is the order of the denominator and
 is the order fo the numerator.
The 2^{nd} order transfer function is
Eqn. 1 
where only and are accessible for measurement. Eqn. 1 can be rearrangend to
Note that
If
and
then
Therefore
where is a back shift operator. The resulting difference equation is
The resulting model error is
The minimum amount of required data results in
Eqn. 2 
Define
The error vector is
where
 is shorthand for ,
 is (p x 1)  the input data,
 is (p x 1)  the error vector,
 is (p x 1)  the output data to mimic,
 is (p x (n + m))  input and previose output data, and
 is ((n + m) x 1)  transfer function coefficients.
The minimum number of necessary data points is usually small. In the case of our 2^{nd} order system we have 4 unknowns so we need at least 4 data points. Unless the system is very simple the minimum number of data points required for reasonable accuracy is much higher. This results is a much larger p (number of rows in , etc.). The much larger p should provide a more accurate fit. The equation used to find the optimal paramters (the fit with the least error) is
Eqn. 3 
where
pseudoinverse 
The parameters can be weighted or tuned like a filter using a weighting matrix . Then is the coefficients of the best predictor when
Eqn. 4 
When the number of data points used for the fit is large then the use of a weighting matrix is unrealistic. Take notice of the
portion of Eqn. 4. If is (p x (n + m)) then matrix multiplication requires that the weighting matrix be (p x p). If then is too large to use.
2.1 Extension to MIMO
The mathematics described in the previous section is for SISO systems. Any desired prediction can be calculated by
where
Eqn. 5 
is the difference equation from which Eqn. 5 is generated. This equation can be modified for use in MIMO systems by
where
The final MIMO equation for 2 inputs and 2 outputs is
3 Notes of caution about Least Squares based techniques
Least squares techniques calculate coefficients for a polynomial approximation. All polynomials have tails  that is the ends of a polynomial trend toward + or  infinity. Therefore a polynomial fit can be good for interpolation but not for extrapolation. When dealing with measured data it is often difficult to determine the full breadth of the input data without extensive analysis. Often there will be lots of data towards the upper or lower bounds but there may be very little data in between. System behavior can lead to transitions that provide only handful of data points. When comparing these few points to the rest of the data the transition points can become lost in the rest of the data. The weighting of the coefficients will tend to mirror the weighting of the number of data points in particular regions. With multidimensional input data visualizing the weighting behavior is difficult.
3.1 Rich Excitation
Least squares methods are polynomial fits of input parameters to output parameters. This requires that the input to output relationship is linear. It also means that for output estimation to be accurate that the test inputs must fall within the range of inputs used to create the polynomial coefficients. In other words, a polynomial of arbitrary order is constructed fitting input data to output data. The input data used has some range (use [1 1] for example). If this polynomial is used to estimate an output due to an input outside of that range (for example, 2) then the estimate will likely be inaccurate.
In 2 dimensions the reason for this is simple. All polynomials have tails that go to infinity. The polynomial created with least squares is usually stable (and reasonably accurate) within the range of input values used in its creation. Least squares polynomials are good for interpolation but become unstable when used for extrapolation.
The method described if for use in Control systems. The transfer function derived with least squares is then used to design the system controller. If the transfer function is inaccurate for some portion of available inputs the system can go unstable.
In short, the concept of Rich Excitation is to 'ping' the black box system with all possible inputs. The output of these inputs provides a range of all possible outputs. Thus the least squares fit can work with the entire range of applicable values and all output estimates are interpolations rather than extrapolations.
The entire range of applicable input values must be considered carefully. It must include inputs across the entire magnitude range as well as inputs of all possible frequencies. Careful consideration of the input signal is necessary in order to obtain the proper input/output data set.
In practice the standard solution is to use white noise as the input. White noise contains all frequencies at equal magnitude. Thus, given enough inputs samples, the system is tested across all physically possible inputs. Rich Excitation in the case of many real systems is not possible. This limits the ability of least squares methods to accurately predict the output.
4 References
 Broussard, K. J. and Trahan, R. E. Jr., "Automatic Control System Failure Detection via Parameter Identification Techniques," IEEE Proceedings of Southeastcon, April 1991, pp. 176180.
 Franklin, G. F., Powell, J. D., and Workman, M. 1998 Digital Control of Dynamic Systems. 3^{rd}. AddisonWesley Longman Publishing Co., Inc. ISBN 0201331535
 Spradlin, Gabriel T. '"AN EXPLORATION OF PARAMETER IDENTIFICATION TECHNIQUES: CMG TEMPERATURE PREDICTION THEORY AND RESULTS." Master's Thesis, University of Houston, Houston, TX December 2005.