Offline Least Squares
 Offline Least Squares Parameter Identification System Identification In order to prevent spam, users must register before they can edit or create articles.

## 1 Introduction to Offline Least Squares

The following offline least squares derivation comes from class notes provided for Dr. Shieh's ECE 7334 Advanced Digital Control Systems at the University of Houston. Similar derivations are presented in [[1] and [2]]. Initially the offline least squares technique presented is for a Single-Input Single-Output (SISO) system. The technique is for determining a transfer function of (nearly) arbitrary order. (The number of available data points ultimately limits the maximum order of the transfer function.)

After the SISO offline least squares method is presented, an extension to MIMO offline least squares is presented.

## 2 Derivation of Offline Least Squares[3]

The minimum number of data points required is

 $LaTeX: P=n+m$

where

$LaTeX: n$ is the order of the denominator and
$LaTeX: m$ is the order fo the numerator.

The 2nd order transfer function is

 $LaTeX: \hat{y}\left(z\right)=\frac{b_{0}+b_{1}z^{-1}}{1+a_{1}z^{-1}+a_{2}z^{-2}}u\left(z\right)+n\left(z\right)$ Eqn. 1

where only $LaTeX: u\left(z\right)$ and $LaTeX: \hat{y}\left(z\right)$ are accessible for measurement. Eqn. 1 can be rearrangend to

 $LaTeX: \left(1+a_{1}z^{-1}+a_{2}z^{-2}\right)\hat{y}\left(z\right)=\left(b_{0}+b_{1}z^{-1}\right)u\left(z\right)+\left(1+a_{1}z^{-1}+a_{2}z^{-2}\right)n\left(z\right)$

Note that

 $LaTeX: Z\left[y\left(kT-mT\right)\right]=z^{-m}Z\left[y\left(kT\right)\right]=z^{-m}Y\left(z\right)$

If $LaTeX: m=0$

 $LaTeX: y\left(-T\right)=0$$LaTeX: \rArr$$LaTeX: \left(k-m\right)T<0$

and

 $LaTeX: y\left(-2T\right)=0$

then

 $LaTeX: y\left(kT-mT\right)=0$

Therefore

 $LaTeX: z^{-1}Y\left(z\right)$$LaTeX: \rArr$$LaTeX: \left(kT-T\right)$

where $LaTeX: z^{-1}$ is a back shift operator. The resulting difference equation is

 $LaTeX: \hat{y}\left(kT\right)=b_{0}u\left(kT\right)+b_{1}u\left(kT-T\right)-a_{1}\hat{y}\left(kT-T\right)-a_{2}\hat{y}\left(kT-2T\right)+\nu\left(kT\right)$

The resulting model error is

 $LaTeX: \nu\left(kT\right)=n\left(kT\right)+a_{1}n\left(kT-T\right)+a_{2}n\left(kT-2T\right)$

The minimum amount of required data results in

 $LaTeX: \begin{bmatrix}u\left(k\right) & u\left(k-1\right) & -\hat{y}\left(k-1\right) & -\hat{y}\left(k-2\right) \\ u\left(k+1\right) & u\left(k\right) & -\hat{y}\left(k\right) & -\hat{y}\left(k-1\right) \\ u\left(k+2\right) & u\left(k+1\right) & -\hat{y}\left(k+1\right) & -\hat{y}\left(k\right) \\ u\left(k+3\right) & u\left(k+2\right) & -\hat{y}\left(k+2\right) & -\hat{y}\left(k+1\right)\end{bmatrix}\begin{bmatrix}b_{0} \\ b_{1} \\ a_{1} \\ a_{2}\end{bmatrix}+\begin{bmatrix}\nu\left(k\right) \\ \nu\left(k+1\right) \\ \nu\left(k+2\right) \\ \nu\left(k+3\right)\end{bmatrix}=\begin{bmatrix}\hat{y}\left(k\right) \\ \hat{y}\left(k+1\right) \\ \hat{y}\left(k+2\right) \\ \hat{y}\left(k+3\right)\end{bmatrix}$ Eqn. 2

Define

 $LaTeX: A_{p}=\begin{bmatrix}u\left(k\right) & u\left(k-1\right) & -\hat{y}\left(k-1\right) & -\hat{y}\left(k-2\right) \\ u\left(k+1\right) & u\left(k\right) & -\hat{y}\left(k\right) & -\hat{y}\left(k-1\right) \\ u\left(k+2\right) & u\left(k+1\right) & -\hat{y}\left(k+1\right) & -\hat{y}\left(k\right) \\ u\left(k+3\right) & u\left(k+2\right) & -\hat{y}\left(k+2\right) & -\hat{y}\left(k+1\right)\end{bmatrix}$

 $LaTeX: \theta_{p}=\begin{bmatrix}b_{0} \\ b_{1} \\ a_{1} \\ a_{2}\end{bmatrix}$

 $LaTeX: \nu_{p}=\begin{bmatrix}\nu\left(k\right) \\ \nu\left(k+1\right) \\ \nu\left(k+2\right) \\ \nu\left(k+3\right)\end{bmatrix}$

 $LaTeX: \hat{y}_{p}=\begin{bmatrix}\hat{y}\left(k\right) \\ \hat{y}\left(k+1\right) \\ \hat{y}\left(k+2\right) \\ \hat{y}\left(k+3\right)\end{bmatrix}$

The error vector is

 $LaTeX: \nu_{p}=\hat{y}_{p}-A_{p}\theta_{p}$

where

$LaTeX: u\left(k+1\right)$ is shorthand for $LaTeX: u\left(kT+T\right)$,
$LaTeX: u$ is (p x 1) - the input data,
$LaTeX: \nu_{p}$ is (p x 1) - the error vector,
$LaTeX: \hat{y}_{p}$ is (p x 1) - the output data to mimic,
$LaTeX: A_{p}$ is (p x (n + m)) - input and previose output data, and
$LaTeX: \theta_{p}$ is ((n + m) x 1) - transfer function coefficients.

The minimum number of necessary data points is usually small. In the case of our 2nd order system we have 4 unknowns so we need at least 4 data points. Unless the system is very simple the minimum number of data points required for reasonable accuracy is much higher. This results is a much larger p (number of rows in $LaTeX: A_{p}$, etc.). The much larger p should provide a more accurate fit. The equation used to find the optimal paramters (the fit with the least error) is

 $LaTeX: \theta_{p}^{*}=\left(A_{p}^{T}A_{p}\right)^{-1}A_{p}^{T}\hat{y}_{p}=A_{p}^{+}\hat{y}_{p}$ Eqn. 3

where

 $LaTeX: A_{p}^{+}=\left(A_{p}^{T}A_{p}\right)^{-1}A_{p}^{T}$ pseudo-inverse

The parameters can be weighted or tuned like a filter using a weighting matrix $LaTeX: W_{p}$. Then $LaTeX: \theta_{p}^{*}$ is the coefficients of the best predictor when

 $LaTeX: \theta_{p}^{*}=\left(A_{p}^{T}\bar{W}_{p}A_{p}\right)^{-1}A_{p}^{T}\bar{W}_{p}\hat{y}_{p}$ Eqn. 4

When the number of data points used for the fit is large then the use of a weighting matrix is unrealistic. Take notice of the

 $LaTeX: \left(A_{p}^{T}\bar{W}_{p}A_p\right)^{-1}$

portion of Eqn. 4. If $LaTeX: A_{p}$ is (p x (n + m)) then matrix multiplication requires that the weighting matrix be (p x p). If $LaTeX: p>>1$ then $LaTeX: W_{p}$ is too large to use.

### 2.1 Extension to MIMO

The mathematics described in the previous section is for SISO systems. Any desired prediction can be calculated by

 $LaTeX: y\left(kT\right)=\begin{bmatrix}u\left(kT\right) & u\left(kT-T\right) & -y\left(kT-T\right) & -y\left(kT-2T\right)\end{bmatrix}\begin{bmatrix}b_{0} \\ b_{1} \\ a_{1} \\ a_{2}\end{bmatrix}=\phi^{T}\left(kT\right)\theta$

where

 $LaTeX: y\left(kT\right)+a_{1}y\left(kT-T\right)+a_{2}y\left(kT-2T\right)=b_{0}u\left(kT\right)+b_{1}u\left(kT-T\right)$ Eqn. 5

is the difference equation from which Eqn. 5 is generated. This equation can be modified for use in MIMO systems by

 $LaTeX: y_{1}\left(kT\right)=\phi^{T}\left(kT\right)\theta_{1}$

where

$LaTeX: U=\begin{bmatrix}u_{1}\left(kT\right) & u_{2}\left(kT\right) & u_{1}\left(kT-T\right) & u_{2}\left(kT-T\right)\end{bmatrix}$
$LaTeX: Y=\begin{bmatrix}y_{1}\left(kT-T\right) & y_{2}\left(kT-T\right) & y_{1}\left(kT-2T\right) & y_{2}\left(kT-2T\right)\end{bmatrix}$
$LaTeX: \phi_{1}^{T}=\begin{bmatrix}U & Y\end{bmatrix}$
$LaTeX: \theta_{1}^{T}=\begin{bmatrix}b_{011} & b_{012} & b_{111} & b_{112} & a_{111} & a_{112} & a_{211} & a_{212}\end{bmatrix}$
 $LaTeX: y_{2}\left(kT\right)=\phi_{2}^{T}\left(kT\right)\theta_{2}$

The final MIMO equation for 2 inputs and 2 outputs is

 $LaTeX: \begin{bmatrix}y_{1}\left(kT\right) & y_{2}\left(kT\right)\end{bmatrix}=\begin{bmatrix}\phi_{1}^{T}\left(kT\right)\theta_{1} & \phi_{2}^{T}\left(kT\right)\theta_{2}\end{bmatrix}$

## 3 Notes of caution about Least Squares based techniques

Least squares techniques calculate coefficients for a polynomial approximation. All polynomials have tails - that is the ends of a polynomial trend toward + or - infinity. Therefore a polynomial fit can be good for interpolation but not for extrapolation. When dealing with measured data it is often difficult to determine the full breadth of the input data without extensive analysis. Often there will be lots of data towards the upper or lower bounds but there may be very little data in between. System behavior can lead to transitions that provide only handful of data points. When comparing these few points to the rest of the data the transition points can become lost in the rest of the data. The weighting of the coefficients will tend to mirror the weighting of the number of data points in particular regions. With multi-dimensional input data visualizing the weighting behavior is difficult.

### 3.1 Rich Excitation

Least squares methods are polynomial fits of input parameters to output parameters. This requires that the input to output relationship is linear. It also means that for output estimation to be accurate that the test inputs must fall within the range of inputs used to create the polynomial coefficients. In other words, a polynomial of arbitrary order is constructed fitting input data to output data. The input data used has some range (use [-1 1] for example). If this polynomial is used to estimate an output due to an input outside of that range (for example, 2) then the estimate will likely be inaccurate.

In 2 dimensions the reason for this is simple. All polynomials have tails that go to infinity. The polynomial created with least squares is usually stable (and reasonably accurate) within the range of input values used in its creation. Least squares polynomials are good for interpolation but become unstable when used for extrapolation.

The method described if for use in Control systems. The transfer function derived with least squares is then used to design the system controller. If the transfer function is inaccurate for some portion of available inputs the system can go unstable.

In short, the concept of Rich Excitation is to 'ping' the black box system with all possible inputs. The output of these inputs provides a range of all possible outputs. Thus the least squares fit can work with the entire range of applicable values and all output estimates are interpolations rather than extrapolations.

The entire range of applicable input values must be considered carefully. It must include inputs across the entire magnitude range as well as inputs of all possible frequencies. Careful consideration of the input signal is necessary in order to obtain the proper input/output data set.

In practice the standard solution is to use white noise as the input. White noise contains all frequencies at equal magnitude. Thus, given enough inputs samples, the system is tested across all physically possible inputs. Rich Excitation in the case of many real systems is not possible. This limits the ability of least squares methods to accurately predict the output.

## 4 References

• Broussard, K. J. and Trahan, R. E. Jr., "Automatic Control System Failure Detection via Parameter Identification Techniques," IEEE Proceedings of Southeastcon, April 1991, pp. 176-180.
• Franklin, G. F., Powell, J. D., and Workman, M. 1998 Digital Control of Dynamic Systems. 3rd. Addison-Wesley Longman Publishing Co., Inc. ISBN 0201331535
• Spradlin, Gabriel T. '"AN EXPLORATION OF PARAMETER IDENTIFICATION TECHNIQUES: CMG TEMPERATURE PREDICTION THEORY AND RESULTS." Master's Thesis, University of Houston, Houston, TX December 2005.

### 4.1 Notes

1. Broussard et all
2. Franklin et all, pp. 503-506