# THE CORDIC ALGORITHM IMPLEMENTATION FOR TRIGONOMETRIC FUNCTION EVALUATION IN HP21MX Ву PEIHSUNG THOMAS HU Bachelor of Science National Chiao Tung University Hsinchu, Taiwan 1972 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE May, 1978 Theoria 1978 H874c cop.2 # THE CORDIC ALGORITHM IMPLEMENTATION FOR TRIGONOMETRIC FUNCTION EVALUATION IN HP21MX Thesis Approved: Thesis Adviser Thesis Adviser E. d. Shrene Morman Mauhan Dean of the Graduate College #### PREFACE This paper describes the Cordic algorithm and its implementation for the evaluation of the sine function in a HP21MX computer. A polynomial method is also described and implemented in the HP21MX computer for the purpose of comparing the result with the the Cordic algorithm. The HP21MX microprogramming is also applied in this experiment to increase the programming efficiency. I would like to express my gratitude to my major advisers, Dr. Edward Shreve and Dr. G.E. Hedrick for their advice: and guidance during this project. Also, appreciation is expressed to my other committee member, Dr. T.E. Bailey for his invaluable assistance in the preparation of the final manuscript. Thanks are also extended to Mrs. Pam Haught for her typing this paper and her invaluable help in preparing the final copy of this paper. # TABLE OF CONTENTS | Chapter | | | | | | | Page | |---------|---------------------------------------------------|-----|-----|---|---|---|------| | I. | INTRODUCTION | • | | • | | | 1 | | II. | STANDARD TECHNIQUE FOR THE EVALUATION OF TRIGONOR | ME] | [R] | C | | | | | | FUNCTIONS | • | • | | • | • | 3 | | III. | THE CORDIC ALGORITHM | • | | • | • | | 7 | | | Introduction | | | | | | 7 | | | Functional Description | | | | | | 7 | | | Representation of Angles in Cordic | | | | | | 17 | | | Sine and Cosine Algorithm | • | • | • | • | • | 20 | | | | | | | | | | | IV. | COMPUTER IMPLEMENTATION AND PROGRAMMING RESULTS | • | • | • | • | • | 22 | | | System Features | | | | | | 22 | | | Hardware Registers | • | • | • | • | • | 23 | | | Display Register | • | • | • | | • | 24 | | | Interrupt System | • | • | • | • | • | 25 | | | APL Description of HP21MX | | | | | | 25 | | | The Processor | | | | | | 27 | | | Instruction Fetch | | • | | | | 27 | | | Instruction Decoding | | | | | | 29 | | | Instruction Execution | | | | • | | 43 | | | Interrupt Service | | | | | ٠ | 43 | | | Input/output Interrupts | | • | • | • | | 47 | | | Memory Access Routine | • | • | • | • | • | 47 | | | Address Computation Routine | | | | | | 48 | | | Instruction Execution Routine | • | • | • | • | • | 49 | | | Microprogramming | | • | | | | 60 | | | Conventional Control Section | • | | | | • | 60 | | | Microprogrammed Control Section | | • | ٠ | | | 61 | | | The Micro-programmable Computer | | | | | • | 62 | | | Control Section | | | | | | 62 | | | The Control Processor | | | | | | 64 | | | Main Memory | | | | | | 65 | | | Input and Output | | | | | | 66 | | | Arithmetic and Logic Section | | | | | | 66 | | | Implementation of a Polynomial Algorithm in | th | ie | • | • | • | 6.0 | | Chapter | Page | |-----------------------------------------------|------| | Implementation of the Cordic Algorithm on the | | | on the HP21MX Computer | 69 | | Calculation of Execution Time | 75 | | V. OTHER USES OF CORDIC | 88 | | Arctangent Algorithm | 88 | | Functional Description | 88 | | Decimal to Binary Conversions in Cordic | 89 | | VI. SUMMARY AND CONCLUSIONS | 98 | | A SELECTED BIBLOIGRAPHY | 106 | | APPENDIXES | 108 | | APPENDIX A - FUNCTIONAL BLOCK DIAGRAM | 108 | | APPENDIX B - PROGRAM LISTINGS | 110 | # LIST OF TABLES | Table | | Page | |-------|------------------------------------------------------------------------------------------------|------| | I. | Typical Rotation Computing Sequence | 18 | | II. | Typical Vectoring Computing Sequence | 19 | | III. | Interrupt Assignments | 26 | | IV. | "PROC" Program Segments | 27 | | V. | Decoding Vectors | 30 | | VI. | Instruction Classes | 31 | | VII. | The Navigation Matrix | 32 | | VIII. | Polynomial Method Implementation Results (Assembly Language) of Evaluating the Sine Function | 70 | | IX. | Polynomial Method Implementation Results (Microprogram) of Evaluating the Sine Function | 72 | | х. | Cordic Algorithm Implementation Results (Assembly Language) of Evaluating the Sine Function | 82 | | XI. | Cordic Algorithm Implementation Results (Microprogram) of Evaluating the Sine Function | 85 | | XII. | The Conventional Decimal-To-Binary Conversion | 91 | | XIII. | Decimal-To-Binary Conversions in Cordic | 94 | | XIV. | Generation of $\pm$ Code for 45° | 96 | | XV. | The Comparison Between the Cordic Algorithm Implementation Result and the Standard Sine Value | 99 | | XVI. | The Comparison Between the Polynomial Method Implementation Result and the Standard Sine Value | 102 | # LIST OF FIGURES | Figu | Pag | e | |------|----------------------------------------------------------------------------------------|----| | 1. | Typical Computing Step | .0 | | 2. | Cordic Arithmetic Unit | 4 | | 3. | Representation of Angles in Cordic | 1 | | 4. | The Processor System Program | 8 | | 5. | Input/Output Interrupt Generator | 3 | | 6. | Instruction Decoding Matrices 4 | 4 | | 7. | Memory Access Operation | 8 | | 8. | Address Computation Operation | 9 | | 9. | EXEC Routine | 0 | | 10. | A Microprogram Implementation of One Macroprogram Instruction | 3 | | 11, | Cordic Algorithm | 6 | | 12. | The AHPL Description for the Cordic Algorithm in Implementation in HP21MX Microprogram | 9 | | 13. | Implementation of + Code to Binary Conversion | 7 | # LIST OF SYMBOLS | Symbol Symbol | Dimension | Function | |---------------|----------------------|-----------------------------------------| | ADC | | Address computation defined operation | | EXEC | | Instruction execution defined operation | | IOIG | | I/O interrupt generator system program | | MAC | | Memory access defined operation | | PROC | | Processor unit system program | | RUN | | Run indicator | | A | 16 | Accumulator (See Chapter IV) | | В | 16 | Accumulator extension | | С | 16 | Local vector | | D | | Decoding matrices | | E | 1 | Extend register | | F | 56 | I/O device flag | | M | 2 <sup>15</sup> , 16 | Main memory | | N | 167, 9 | Navigation matrix (See Figure 4) | | I · | 16 | Instruction register | | 0 | 1 | Overflow register | | P | 16 | Program counter | | Q | 12 | Mask vector | | U | 12 | OP code vector | | S | | Current interrupt priority level | | T | 16 | T-bus | | V | 56 | I/O device control bit | |------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------| | Х | 16 | X-register | | Y | 16 | Y-register | | Z | 56,8 | I/O device data buffer | | a,b,m,t,i,j | | Local vectors | | d | 2 | Local vectors | | e | 4 | Program exceptions | | e <sub>0</sub> | | Power fail | | e <sub>1</sub> | | Memory parity | | e <sub>2</sub> | | Dual-channel port controller 1 | | $e_3$ | in the second se | Dual-channel port controller 2 | | g | 16 | Local vectors | | h | 2 | Interrupt holder | | h <sub>0</sub> | | Exceptions | | h <sub>1</sub> | | I/O interrupt | | 1 | 16 | Local Vector | | n | 9 | Navigation vector | | <sup>n</sup> 0, <sup>n</sup> 1, <sup>n</sup> 3 | | Branch control in EXEC | | n <sub>2</sub> | | Entry line in EXEC | | n <sub>4</sub> | | Instruction class | | p | 4 | Memory access quene | | r | . 4 | Memory access request | | v | 9 | Temporary navigation vector | #### CHAPTER I #### INTRODUCTION In the past, the transcendental functions were computed by mathematicians using many different algorithms. Power series, polynominal expansions, continued fractions, and Chebyshev polynomials have all been used. Since the advent of large scale computing in the twentieth century, many mathematical functions including transcendental functions have been calculated by computers. As a general rule, multiplication and division are very time-consuming functions compared to addition and subtraction implemented in a computer. A review of the conventional methods which are used for solving transcendental functions, such as power series, polynomial expansions, continued fractions, and Chebyshev polynomials, shows that a number of multiplications and divisions are required that results in inefficiency of implementation. Therefore, much effort has been made to search for alternate ways which can best suit the requirements of speed and programming efficiency for real-time applications. Henry Briggs (17) first developed the concept of pseudo-division and pseudo-multiplication in 1924. He used this method to generate a table of logarithms. In 1959, J. E. Volder (9) described a Coordinate Rotation Digital Computer (Cordic) for the calculation of trigonometric functions, multiplication, division, and conversion between binary and mixed radix number systems. In the same year, Dagget (10) discussed the use of the Cordic computer for decimal-binary conversion. In 1962, Meggitt (11) developed a pseudo-division and pseudo-multiplication processor using the Cordic technique, while in 1971 J. S. Walther (12) developed a technique for calculating elementary functions using Cordic. David S. Cochran (14) in 1972 implemented the Cordic algorithm in HP 35 calculators, and Despain (13) in 1974 developed a technique for Fourier transformation using the Cordic algorithm. Generally speaking, the trigonometric functions are calculated by polynomial expansions, power series, or Chebyshev polynomials in most current general purpose computers. The major goal of this thesis is to implement the Cordic algorithm in a general purpose computer for evaluation of trigonometric functions. The speed and accuracy of the results are observed and compared with those of conventional algorithms. Microprogramming has been used in this research to increase the program efficiency. The anticipated result is to determine the best way of evaluating the trigonometric functions, which can reduce the computer execution time to a minimum and give reasonable accuracy of the results. Only the sine function is implemented as a part of this research. The tasks are divided into four parts: - 1. Implement the Cordic algorithm in an assembly coded program. - 2. Implement the Cordic algorithm in a microprogram. - Implement one of the conventional methods in an assembly coded program. - 4. Implement the same conventional method in a microprogram. #### CHAPTER II # STANDARD TECHNIQUE FOR THE EVALUATION OF TRIGONOMETRIC FUNCTIONS The evaluation of elementary functions for various values of their arguments is required to solve a number of mathematical problems. Because of this, the computation of values of elementary functions was an important factor in stimulating the development of mathematical analysis. Therefore, a great deal of effort has been made by many mathematicians in the past two centuries to find methods of evaluating these elementary functions. Power series have been and still are used for this purpose. Mercator used a power series for logarithms; Newton used it then for trigonometric and inverse trigonometric functions; and Euler used one for the exponential function. Iterative processes (e.g., Newton's method) were also applied for solving equations (3). Furthermore, in the eighteenth century, many mathematicians (Lambert, Euler, Lagrange, et al.) used continued fractions to represent elementary functions. In recent years the technique of expansions in orthogonal polynomials has been widely applied for computing elementary functions. The Chebyshev polynomials which give good convergence are widely used for this purpose too. All those methods mentioned above are well documented and are described in many mathematics books; thus it is not necessary to explain them here. Power series for evaluating trigonometric functions are used in this paper as a conventional method of evaluating trigonometric functions in order to compare them to evaluations using the Cordic algorithm. Therefore, for convenience, the power series method is described as follows: #### Power Series The elementary functions can be represented as power series in a number of ways. Consider the Taylor-Maclaurin Series for a given function f(x): $$f(x) = \sum_{k=0}^{a} \frac{f^{(k)}(0)}{k!} x^{k}$$ (2.1) Truncating this at the nth term produces an nth-degree polynomial $S_{n}(x)$ (a finite Taylor Series). $$S_n(x) = \sum_{k=0}^n \frac{f^{(k)}(0)}{k!} x^k$$ (2.2) The polynomial $S_n(x)$ has the following properties: $$f(x) = S_n(x) + O(x^n)$$ (2.3) where $\boldsymbol{S}_{n}\left(\boldsymbol{x}\right)$ is the unique nth-degree polynomial of best approximation $\boldsymbol{P}_{n}\left(\boldsymbol{x}\right)$ , for which $$f(x) - P_n(x) = O(x^n)$$ (2.4) If $f(x) = \sin(x)$ , then $\sin(x)$ can be represented in a power series as: $$\sin(x) = \sum_{k=0}^{\infty} (-1)^k \frac{x^2 k + 1}{(2k+1)!}$$ (2.5) Cos(x) can be represented in a power series as: $$\cos(x) = \sum_{k=0}^{\infty} (-1)^{k} \frac{x^{2k}}{(2k)!}$$ (2.6) In order to implement this algorithm in a computer for evaluation of trigonometric functions, the number of terms (i.e., constant k) required for specific accuracy is determined first. To determine the constant k, the maximum accuracy of evaluation in the computer must be known first. The computer used in this research is an HP21MX, the memory word of which contains 16 bits. Although multiple precision could be achieved by using multiple words in arithmetic operations, single precision (single word) is still used in the Cordic algorithm and power series here for the sake of simplicity of programming. Hastings (4) set up three equations by using power series to evaluate the sine function, which are as follows: $$\sin \frac{\pi}{2} x = c_1 x + c_3 x^3 + c_5 x^5$$ $$c_1 = 1.5706268$$ $$c_3 = -0.6432392$$ $$c_5 = 0.0727102$$ $$\sin \frac{\pi}{2} x = c_1 x + c_3 x^3 + c_5 x^5 + c_7 x^7$$ $$c_1 = 1.570794852$$ $$c_3 = -0.645920978$$ $$c_5 = 0.079487663$$ $$c_7 = -0.004362476$$ $$\sin \frac{\pi}{2} x = c_1 x + c_3 x^3 + c_5 x^5 + c_7 x^7 + c_9 x^9$$ $$c_1 = 1.57079631847$$ $$c_3 = -0.64596371100$$ (2.7) $c_5 = 0.07968967928$ $c_7 = -0.00467376557$ $c_0 = 0.00015148419$ where $-1 \le x \le 1$ To determine which equation will be used in this paper, the maximum value of the error of each equation is checked. The maximum value of the error is 0.0001 for equation (2.7), 0.000001 for equation (2.8), and 0.000000005 for equation (2.9). For a 16-bit computer word, the maximum accuracy that can be represented is 5 decimal digits. The accuracy of equations (2.8) and (2.9) is more than 5 decimal digits. If they are used to evaluate sine functions in a 16-bit word machine, they will consume a lot more execution time than equation (2.7) with just a slightly more accurate result. Therefore, in order to get the best execution time and accuracy, equation (2.7) is used in this research. #### CHAPTER III #### THE CORDIC ALGORITHM #### INTRODUCTION Cordic is a special purpose, binary computer which contains a unique arithmetic unit which differs from the arithmetic unit of conventional computers. Although Cordic is a single processor computer, its arithmetic unit is composed of three shift registers and three adder-subtractors which are operated in parallel instead of sequentially. Each programmed operation is accomplished in a fixed number of steps. Each step involves modifying three numbers which reside in three arithmetic unit registers by adding or subtracting a constant for each one. Setting of all three adder-subtractors is controlled by the sign of the quantity in one of the arithmetic unit registers. In this way, calculations related to the addition or subtraction of constants can be executed simultaneously. # Functional Description There are two computing modes in Cordic for the trigonometric operations: ROTATION and VECTORING. In the ROTATION mode the coordinate components of a vector and an angle of rotation are given and the coordinate components of the original vector, after rotation through the given angle, are computed. In the VECTORING mode, the coordinate components of a vector are given and the magnitude and angular argument of the original vector are computed. The basic computing technique used in both the ROTATION and VECTORING modes in Cordic is a step-by-step sequence of pseudo-rotations which result in an overall rotation through a given angle (ROTATION) or result in a final angular argument of zero (VECTORING). It is necessary that the angular increments of rotation be computed in decreasing order (9). In order to evaluate the sine and cosine functions for the angles from -180° to 180°, the magnitude actually chosen for the first increment should be 90°. The expression for a set of coordinate components, $X_1$ and $Y_1$ , rotated through plus or minus 90° is simply $$Y_2 = \pm X_1 = R_1 \sin(\theta_1 \pm 90^\circ)$$ (3.1) $$X_2 = + Y_1 = R_1 \cos(\theta + 90^\circ)$$ (3.2) Where $R_1$ and $\theta$ , are the magnitude and angle of the vector $(X_1, Y_1)$ and $X_2$ and $Y_2$ are the coordinates of vector $(X_1, Y_1)$ after rotating 90°. The first step is unique in that a perfect rotation step is performed. The remaining computing steps can be clarified by examining relationships involved in a typical rotation step which are shown in Figure 1. Consider two given coordinate components, $Y_i$ and $X_i$ , in the plane coordinate system shown. In this discussion, the quantity i is equal to the number of the particular step under consideration. The components $Y_i$ and $X_i$ are associated with the ith step and describe a vector of magnitude $R_i$ at an angle $\theta_i$ with respect to the origin according to the relationships. $$Y_{i} = R_{i} \sin \theta \tag{3.3}$$ $$X_{i} = R_{i} \cos \theta \tag{3.4}$$ In Figure 1 the angle $\alpha_{\bf i}$ is the magnitude of rotation associated with each computing step. The general expression for $\alpha_{\bf i}$ where i > 1 is x $$\alpha_{i} = \tan^{-1} 2^{-(i-2)}$$ (3.5) The reason for choosing this particular magnitude of $\alpha_i$ is that a rotation of coordinate components through $\pm \alpha_i$ may be accomplished by the simple process of shifting and adding. The two choices of positive or negative rotation are shown in Figure 1. The general expressions for the rotated components are $$Y_{i+1} = \sqrt{1+2^{-2(i-2)}} R_{i} \sin(\theta_{i} + \alpha_{i})$$ $$= Y_{i} + 2^{-(i-2)} X_{i}$$ (3.6) and $$X_{i+1} = \sqrt{1 + 2^{-2(i-2)}} R_{i}^{\cos(\theta_{i} + \alpha_{i})}$$ $$= X_{i} + 2^{-(i-2)Y}_{i}$$ (3.7) Note that the right-hand terms of (3.6) and (3.7) may be obtained by two simultaneous shift-and-add operations, if the angular rotation magnitude is restricated to (3.5). This is the fundamental relationship upon which the Cordic computing technique is based. The computing action of adding (or subtracting) a shifted value Figure 1. Typical computing step of $X_i$ to $Y_i$ to obtain $Y_{i+1}$ , while simultaneously subtracting (or adding) a shifted value of $Y_i$ to $X_i$ to obtain $X_{i+1}$ is termed "cross addition". The terms under the radical in (3.6) and (3.7) indicate the increase in magnitude when i > 2; either of the two choices of direction produces the same change in magnitude. If the rotation is always through either a positive or negative $\alpha_i$ at each step, then the increase in magnitude may be considered as a constant. This requirement does not allow the choice of zero rotation at any step. In order to identify the choice in a particular step, the $\pm$ notation may be represented by the binary operator $v_i$ , where $v_i$ can be either $\pm$ 1 or $\pm$ 1. This substition produces the general expressions $$Y_{i+1} = \sqrt{1 + 2^{-2(i-2)}} R_i \sin(\theta_i + v_i \alpha_i)$$ (3.8) and $$X_{i+1} = \sqrt{1 + 2^{-2(i-2)}} \quad R_i \cos(\theta_i + v_i \alpha_i)$$ (3.9) where $v_i = +1$ or -1 Similarly, after the completion of the rotation step in which the i+1 terms are obtained, the i+2 terms may be computed from these terms with the results $$Y_{i+2} = \sqrt{1 + 2^{-2(i-1)}} \sqrt{1 + 2^{-2(i-2)}} R_i \sin(\theta_i + v_i \alpha_i + v_{i+1} \alpha_{i+1})$$ (3.10) and $$X_{i+2} = \sqrt{1 + 2^{-2(i-1)}} \sqrt{1 + 2^{-2(i-2)}} R_{i} \cos(\theta_{i} + v_{i}\alpha_{i} + v_{i+1}\alpha_{i+1})$$ (3.11) $$Y_1 = R_1 \sin\theta \tag{3.12}$$ and $$X_1 = R_1 \cos \theta \tag{3.13}$$ Suppose the first rotation step is $\pm$ 90° and the number of steps is determined as n. The expressions for the final coordinate components will be $$Y_{n+1} = \sqrt{1 + 2^{-0}} \sqrt{1 + 2^{-2}} ... \sqrt{1 + 2^{-2(n-2)}}) R_{1} \sin(\theta_{1} + V_{1}^{\alpha}_{1} + V_{2}^{\alpha}_{2} + ... + V_{n}^{\alpha}_{n})$$ (3.14) and $$x_{n+1} = (\sqrt{1 + 2^{-0}} \quad \sqrt{1 + 2^{-2}} \quad \dots \sqrt{1 + 2^{-2(n-2)}}) \quad R_1 \cos(\theta_1 + v_1 \alpha_1 + v_2 \alpha_2 + \dots + v_n \alpha_n)$$ (3.15) The increase in magnitude of the components for a particular value n is a constant and is represented by k. The value selected for n is a function of the desired computing accuracy and can be a constant for a particular computer. For example, if $$n = 24$$ , $k = 1.646760255$ . The basic components required to perform the cross-addition are shown in Figure 2. It has not yet been shown how the prescribed sequence of rotation steps can be controlled to effect the desired over-all rotation. By examination of (3.14) and (3.15), the rotation of a set of coordinate components $Y_1$ and $X_1$ through a given angle can be expressed as $$Y_{n+1} = KR_1 \sin(\theta_1 + \lambda)$$ (3.16) and $$X_{n+1} = KR_1 \cos(\theta_1 + \lambda)$$ (3.17) where $$\lambda = v_1 \alpha_1 + v_2 \alpha_2 + \dots + v_n \alpha_n$$ (3.18) In the VECTORING mode, $$-\theta_1 = \lambda$$ , ie, $-\theta_1 = v_1 \alpha_1 + v_2 \alpha_2 + \dots + v_n \alpha_n$ (3.19) The sequence of (3.18) and (3.19) form a special radix representation equivalent to the desired angle, $\lambda$ or $\theta$ , where $$\alpha_1 = 90^{\circ}$$ (3.20) $$\alpha_2 = \tan^{-1} 2^{-0} = 45^{\circ}$$ (3.21) $$\alpha_3 = \tan^{-1} 2^{-1} = 26.5^{\circ}$$ (3.22) $$\alpha_{i} = \tan^{-1} 2^{-(i-2)}$$ (3.23) The $\alpha$ terms are referred to as ATR (Arctangent Radix) constants and are precomputed and stored in the computer. The $\vee$ terms are referred to as ATR digits and are determined during each operation. In the Cordic computer, the ATR digits are determined sequentially, most significant digit first, and are used to control the conditional Figure 2. Cordic Arithmetic Unit action of the adder-subtractors in the arithmetic unit. The following paragraphs contain a description of the manner in which the ATR code representation, $v_1$ , $v_2$ , $v_3$ , ..., $v_n$ can be determined for any given angle, $\lambda$ or $\theta$ . First, for any angle $\lambda$ or $\theta$ , there must be at least one set of values of Y for the operators that will satisfy (3.18) and (3.19). Second, a simple technique must be available for determing the ATR code digits that satisfy these equations. The following relationships are necessary and sufficient for any sequence of radix constants to meet the above requirements (3.9). $$|\lambda \text{ or } \theta| \leq \alpha_1 + \alpha_2 + \alpha_3 + \dots + \alpha_n + \alpha_n$$ (3.24) $$\alpha_{i} \leq \alpha_{i+1} + \alpha_{i+2} + \dots + \alpha_{n} + \alpha_{n}$$ (3.25) For the satisfaction of (3.20) through (3.23), it is required that or $\theta$ be constrained by $$-180^{\circ} \le \lambda \text{ or } \theta \le +180^{\circ}$$ (3.26) Equation (3.26) imposes no special consideration if the two's complement notation is used. By employing an additional register and addersubtractor (identified in Figure 2 as the angle register) the relationship of (3.16) (ROTATION-mode) can be instrumented by 1) sensing the sign of the angle of rotation (or remainder if i > 1) and 2) either subtracting or adding to the angle the ATR constant corresponding to the particular step. In each step, the relationship instrumented is $$|\lambda_{i+1}| = |\lambda_i| - \alpha_i$$ (3.27) Equation (3.24) is equivalent to $$-\alpha_1 \le |\lambda| - \alpha_1 \le \alpha_2 + \alpha_3 + \dots + \alpha_n + \alpha_n$$ (3.28) Application of the relationships of (3.25) results in $$\mid \lambda \mid \equiv \mid \mid \lambda_1 \mid -\alpha_1 \mid \leq \alpha_2 + \alpha_3 + \dots + \alpha_n + \alpha_n$$ (3.29) Continuation of this sequence through $\underset{n}{\alpha}$ results in $$|\lambda_{n+1}| \le \alpha_n \tag{3.30}$$ Equation (3.30) can be used to prove that the remainder in the angle register converges to zero in the ROTATION mode (9). The sequence of operation signs used to null $\lambda$ to zero is the negative of the equivalent ATR code for the original angle. More simply, the ATR code digit of each step is equal to the sign of the quantity in the angle register before each step. Therefore, simultaneously with each step in the angle register, the ATR code digit may be used to control the cross-addition step in the Y and X registers (shown in Figure 2) to effect a rotation of components through an equal angular increment. The proof of the convergence of the effective angular argument $\theta_{n+1}$ to zero, which is necessary in the VECTORING mode, may be obtained by replacing $\lambda$ by $\theta$ . The sign of the angle $\theta_i$ is obtained by sensing the sign of $Y_i$ . The sequence of signs of $Y_i$ is the negative of the ATR code for the effective rotation performed on the components $Y_1$ and $X_1$ . During each cross-addition operation in the Y and X register, the corresponding ATR constant can be conditionally added or subtracted, depending on $v_i$ , to an accumulating sum in the angle register so that, at the end of the computing sequence, when $\theta_{n+1}$ = 0, the quantity in the angle register will be equal to the original angular argument $\theta_1$ of the coordinate components $Y_1$ and $X_1$ . The step-by-step results of a typical rotation computing sequence are shown in Table I. The two's complement notation is used for all quantities, and shift quantities are truncated without round-off. The step-by-step results of a typical rotation computing sequence are shown in Table I. #### Representation of Angles in Cordic In Cordic, angles are represented as a binary fraction of a half revolution ( $\Pi$ ) with two's complements for negative angles, as shown in Figure 3. Since a one to the left of the binary point is used to represent a negative quantity in the two's complement system, angles from $+180^{\circ}$ to slightly less than $+360^{\circ}$ are interpreted internally as negative angles measured clockwise from $0^{\circ}$ . For example, $45^{\circ}$ in Cordic is $$\frac{\Pi/4}{\Pi} = \frac{1}{4} = (0.25)_{10} = (0.01)_2$$ For 90° the Cordic representation is $$\frac{\Pi/2}{\Pi} = \frac{1}{2} = (0.5)_{10} = (0.1)_2$$ For 270° the Cordic representation is $$270^{\circ} = \frac{3\Pi/2}{\Pi} = (1.5)_{10} = (1.1)_{2}$$ TABLE I TYPICAL ROTATION COMPUTING SEQUENCE | Y Register | X Register | Angle Register | |--------------------------|----------------------------|-------------------------------------------------------------------------------------| | $Y_1 = 0.0101110$ | 1.1000101 = X <sub>1</sub> | 0.1100101 = λ | | + 1.1000101 | - 0.0101110 | $-0.1000000$ $tan^{-1} \infty$ | | 1.1000101<br>+ 1.1010010 | 1.1010010<br>- 1.1000101 | 0.0100101<br>- 0.0100000 tan <sup>-1</sup> 1 | | 1.0010111<br>+ 0.0000110 | 0.0001101<br>- 1.1001011 | $\begin{array}{c} 0.0000101 \\ - 0.0010010 \\ \tan^{-1} 2^{-1} \end{array}$ | | 1.0011101<br>- 0.0010000 | 0.1000010<br>+ 1.1100111 | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ | | 1.0001101<br>- 0.0000101 | 0.0101001<br>+ 1.1110001 | $\begin{array}{c} \hline & 1.1111100 \\ + 0.0000101 & \tan^{-1} 2^{-3} \end{array}$ | | 1.0001000<br>+ 0.0000001 | 0.0011010<br>- 1.1111000 | $\begin{array}{c} 0.0000001 \\ -0.0000010 \\ \end{array} \tan^{-1} 2^{-4}$ | | 1.0001001<br>- 0.0000001 | 0.0100010<br>+ 1.1111100 | 1.1111111<br>+ 0.0000001 tan <sup>-1</sup> 2 <sup>-5</sup> | | 1.00010000 | 0.0011110 | 0.0000000 | TABLE II TYPICAL VECTORING COMPUTING SEQUENCE | Y Register | X Register | Angle Register | |--------------------------|----------------------------|-----------------------------------------------------------------------------| | $Y_1 = 0.0101110$ | 1.1000101 = X <sub>1</sub> | 0.0000000 | | - 1.1000101 | + 0.0101110 | $+$ 0.1000000 $\tan^{-1} \infty$ | | 0.0111011<br>- 0.0101110 | 0.0101110<br>+ 0.0111011 | 0.1000000<br>+ 0.0100000 tan <sup>-1</sup> 1 | | 0.0001101<br>- 0.0110100 | 0.1101001<br>+ 0.0000110 | 0.1100000<br>+ 0.0010010 tan <sup>-1</sup> 2 <sup>-1</sup> | | 1.1011001<br>+ 0.0011011 | 0.1101111<br>- 1.1110110 | 0.1110010<br>- 0.0001001 tan <sup>-1</sup> 2 <sup>-2</sup> | | 1.1110100<br>+ 0.0001111 | 0.1111001<br>- 1.1111110 | 0.1101001<br>- 0.0000101 tan <sup>-1</sup> 2 <sup>-3</sup> | | 0.0000011<br>- 0.0000111 | 0.1111011<br>+ 0.0000111 | $\begin{array}{c} 0.1100100 \\ + 0.0000010 \\ \tan^{-1} 2^{-2} \end{array}$ | | 1.1111111 | $0.11111100 = K K_1$ | 0.1100101 = 0 | #### Sine and Cosine Algorithm As mentioned above, there are two computing modes for Cordic, ROTATION and VECTORING. Evaluating sine or cosine functions makes use of the ROTATION mode by setting the original vector on the X-axis and rotating the vector through an angular argument whose sine or cosine is computed. #### Functional Description In order to use the ROTATION computing sequence (Table I) of Cordic to evaluate sine and cosine functions, several initial conditions and values are set up: - 1) The Y-register is initialized with 0. - 2) The X-register is initialized with a unit vector. - 3) The A-register is initialized with the angle which is going to be computed. - 4) A sign digit of 0 in the A-register establishes a v<sub>i</sub> of +1, which causes the top adder subtractor to add, the middle adder-subtractor to subtract, and the bottom adder subtractor to subtract. A sign digit of 1 has the opposite effect. - 5) The number of steps (iterations) is initialized depending on the desired accuracy. The Cordic ROTATION computing sequence is started as shown in Table 1. The final result is in the Y-register if the function evaluated is sine and in the X-register if the function evaluated is cosine after the final computation step. Figure 3. Representation of Angles in Cordic. #### CHAPTER IV #### COMPUTER IMPLEMENTATION AND #### PROGRAMMING RESULTS The four tasks described in Chapter I are performed and the programming results are obtained in this chapter. The description of the HP21MX computer which is used to aid this research is given below. #### System Features The HP21MX computer is a powerful user-microprogrammable minicomputer with 178 micro-instructions and 4K words of control space. Each word is 24 bits long. It has 128 standard instructions, 80 of which emulate the HP 2100 series computer; 42 of which are new instructions for indexing, byte and bit manipulation, byte and word moves, and byte string scanning; and 6 of which are single-precision floating point instructions. There are four general purpose registers, two of which may be used as index registers. It is a fully microprogrammed processor, including all arithmetic functions, input/output, and operator panel control. Writable Control Store (WCS) is optional. The read-only memory (ROM) modules in which microprograms are stored are referred to collectively as control store. Standard control consists of 1,024 directly addressable locations configured into four modules of 256 location each. Each control store location accommodates one micro-instruction, which in turn consists of a 24-bit word encompassing six micro-orders. The control store address space of each processor is 4,096 words. Microprograms in standard control store for executing the various machine functions are divided into three groups: Base instruction set (modules 0 and 1) Floating point instructions (module 14) Extended instruction group (module 15) Unused modules of control store are available for user-supplied microprograms. Microinstructions in control store are 24 bits long; whereas, machine language instructions residing in main memory are 16 bits long. In addition, microinstructions have access to many internal registers and logic functions that machine language instructions cannot use. The Writable Control Store (WCS) option provides a read-write control store module which can be used for the development and execution of user-supplied microprograms. Microprograms in WCS are executed at the same speed as those in the read-only control store. #### Hardware Registers A 16-bit accumulator which holds the results of arithmetic and logical operations performed by programmed instructions. #### B-register Serves the same purpose as the A-register, but is independent from it. #### M-register A 16-bit register used to hold the memory address which is currently being accessed by the CPU. # T-register A 16-bit register used to hold the data which are stored into or retrieved from memory. # P-register Program counter, 16 bits long, pointing to next instruction to be fetched from memory. #### S-register A 16-bit utility register. In the halt or run mode, it can be loaded via the display register. # Extend register A one-bit register used to link the A- and B-registers by rotation instructions or to indicate a carry from the most significant bit (bit 15) of the A- or B-register by an add instruction or increment instruction. #### Overflow Register A one-bit register used to indicate that an add instruction, divide instruction, or an increment instruction has caused the A-register or B-register to exceed the maximum positive or negative number that can be contained in these registers. #### Display register A 16-bit register included in the front panel and used to display and modify the contents of the six 16-bit working registers when the computer is in the halt mode. #### X- and Y-registers Two 16-bit registers serving as indexing registers which are accessed through the use of 30 index register instructions and 2 jump instructions. $\mathbf{S}_1$ to $\mathbf{S}_{12}$ scratch pad registers Twelve registers (each 16 bits long) used to temporarily store data by a microprogram and cannot be accessed by a macroprogram.\* # Interrupt System The vectored priority interrupt system has up to 60 distinct interrupt levels, each of which has a unique priority assignment. Each interrupt level is associated with a numerically corresponding interrupt location in memory. Of the 60 interrupt levels, the first two are reserved for hardware faults (power failure and parity error); the next two are reserved for the Dual-Channel port controller completion interrupts; and the reamining levels are available for I/O device channels. Table III lists the interrupt levels in priority order for the HP 2108 processor of the 21 MX. # APL Description of HP21MX In the APL description of the HP21MX, the computer system is described as seen by a programmer, and the description is independent of any particular hardware implementation. All those instructions which are not connected with this research are not included in this description. Iverson (2) gives a complete definition of the notation used. The description is based on the HP21MX Computer Series Reference Manual (5) and consists of a set of programs and tables. Microprogram - programs stored in control store. <sup>\*</sup> Macroprogram - programs stored in main memory. TABLE III INTERRUPT ASSIGNMENTS | Channel | Interrupt Location | Assignments | |---------|--------------------|-------------------------------------| | (Octal) | | | | 04 | 00004 | Power Fail Interrupt | | 05 | 00005 | Memory Parity/Protect Interrupt | | 06 | 00006 | DCPC Channel 1 Completion Interrupt | | 07 | 00007 | DCPC Channel 2 Completion Interrupt | | 10 | 00010 | 1/0 Device (highest priority) | | 11-20 | 00011-00020 | 1/0 Device (Mainframe) | | 21-42 | 00021-00042 | 1/0 Device (Extender No. 1) | | 43-64 | 00043-00064 | 1/0 Device (Extender No. 2) | The programs are either system programs or defined operations. All programs operate concurrently and continuously, with one line active in each program. The defined operation program operates only when invoked by another program. In the description presented, PROC and IOIG are system programs, whereas ADC, EXEC, and MAC are defined operations. # The Processor The PROC program, Figure 4, describes the sequencing and execution of instructions and the servicing of interrupts. The program segments and their functions are summarized in Table IV. TABLE IV "PROC" PROGRAM SEGMENTS | Lines | Function | |-------|------------------------| | 1-4 | Instruction fetch | | 5-14 | Instruction decoding | | 15-26 | Instruction execution | | 27-30 | Trap interrupt service | | | | #### Instruction Fetch The first step in program execution is to fetch the instruction from memory. In order to prepare for instruction fetch, the exceptions vector is initialized to zero (line 1). The 16-bit instruction is fetched from memory at the address given by the program counter, and placed in the instruction register (line 2). The program counter is incremented by 2 (line 3), and in case of any exceptions during instruction fetch, control branches to line 27. Exceptions during fetch may be due to errors in parity check. Figure 4. The Processor System Program #### Instruction Decoding To determine the operation specified by the instruction, the instruction is decoded next. Because the operation code of an instruction in this machine may be varied from 4 bits to 16 bits and several microinstructions may be involved in a single instruction word for some type of instructions, the decoding task is very complicated and tedious. Many steps and two sets of decoding vectors named u and q are used in this APL description to aid the decoding task. These two sets of vectors ar listed in Table V. The instructions are divided into 13 classes. Table IV summarizes those 13 classes. The number involved in this table is used to identify the class of the instruction during the decoding. The class identifiers j and i are initialized in line 5 and 6. The decoding vectors $\mathbf{U}_{\mathbf{i}}$ and $\mathbf{E}_{\mathbf{i}}$ are used in lines 7, 8, and 9 to identify the class of the current instruction. Once the class of the current instruction is found, it is stated in j (line 10). The components of the selection vector k take on the values of the fields depending on j (lines 11 and 12). Lines 13 and 14 interpret the instruction by selecting a row N<sup>i</sup> from the navigation matrix N (Table VII), to specify the vector n used in subsequent control of the instruction execution. The row of N selected, is determined by an element of a particular decoding matrix D, Figure 6, specified by the instruction class j, and the selection vector k. TABLE V DECODING VECTORS | WMI | $v_1 = (1000101111111110)$ | Q <sub>1</sub> = (111111111111111) | |------|----------------------------------------------------|----------------------------------------------| | JMPI | $U_2 = (10001011111110010)$ | $Q_2 = (11111111111111111111111111111111111$ | | BIMI | $U_3 = (10001011111111000)$ | Q <sub>3</sub> = (11111111111111000) | | BYMI | $U_4 = (10001011111110000)$ | Q <sub>4</sub> = (11111111111111000) | | DMI | $U_5 = (1000001111000000)$ | Q <sub>5</sub> = (11110111111100000) | | IRI | $U_6 = (1000001111100000)$ | Q <sub>6</sub> = (11110111111100000) | | FRI | $U_7 = (1000101000000000)$ | $Q_7 = (1111010000000000)$ | | EAMR | $U_8 = (1000000000000000)$ | Q <sub>8</sub> = (1111010001110000) | | EAR | $U_9 = (1000000000000000)$ | $Q_9 = (1111010000000000)$ | | IOI | U <sub>10</sub> = (000001000000000) | Q <sub>10</sub> = (1111010000000000) | | A/S | $U_{11} = (0000010000000000)$ | Q <sub>11</sub> = (1111010000000000) | | S/R | $U_{12}^{-}$ (000000000000000000000000000000000000 | Q <sub>12</sub> = (1111010000000000) | | | | | # TABLE VI INSTURCTION CLASSES | | Class | j | |-------|----------------------------------------|-----| | MRI: | Memory reference instructions | 0 | | WMI: | Word manipulation instructions | 1 | | MJPI: | Jump instructions | 2 | | BIMI: | Bit manipulation instructions | 3 | | BYMI: | Byte manipulation instructions | 4 | | DMI: | Dynamic mapping system instructions | 5 | | IRI: | Index register instructions | 6 | | FPI: | Floating point instructions | 7 | | EAMR: | Extended arithmetic memory reference | | | | instructions | ` 8 | | EAR: | Extended arithmetic register reference | | | | instructions | 9 | | IOI: | Input/output instructions | 10 | | A/S: | Alter skip instructions | 11 | | S/R: | Shift/rotate instructions | 12 | TABLE VII THE NAVIGATION MATRIX | <sup>n</sup> 0 | n<br>1 | n <sub>2</sub> | <sup>n</sup> 3 | Class | Index | Mnemonic | Name Op Code | |----------------|--------|----------------|----------------|-------|-------|----------|----------------------------------------------| | 0 | 0 | a <sub>0</sub> | a <sub>3</sub> | MRI | 1 | ADA | Add to A -1000 | | 1 | 0 | a <sub>0</sub> | a <sub>3</sub> | MRI | 2 | ADB | Add to B -1001 | | 0 | - | <sup>b</sup> 0 | <sup>b</sup> 5 | IRI | 3 | ADX | Add memory to X<br>1000101111100110 | | 1 | - | <sup>b</sup> 0 | ъ <sub>5</sub> | IRI | 4 | ADY | Add memory to Y 1000101111101110 | | 0 | - | e <sub>0</sub> | e <sub>3</sub> | S/R | 5 | ALF | Rotate A left four 0000001111-1-111 | | 0 | - | e <sub>0</sub> | e <sub>4</sub> | S/R | 6 | ALR | A left shift. clear sign 0000001100-1-100 | | 0 | - | e <sub>0</sub> | e <sub>5</sub> | S/R | 7 | ALS | A left shift 0000001000-1-000 | | 0 | 0 | <sup>a</sup> 0 | a <sub>2</sub> | MRI | 8 | AND | "AND" to A -0010 | | 0 | - | e <sub>0</sub> | e <sub>6</sub> | S/R | 9 | ARS | A right shift 0000001001-1-001 | | - | - | <sup>c</sup> 0 | - | EAR | 10 | ASL | Arithmetic shift left<br>10000000001 | | - | - | <sup>c</sup> 1 | - | EAR | 11 | ASR | Arithmetic shift right 100000100001 | | 1 | - | e <sub>0</sub> | e <sub>3</sub> | S/R | 12 | BLF | Rotate B left four 0000101111-1-111 | | 1 | - | e <sub>0</sub> | e <sub>4</sub> | S/R | 13 | BLR | B left shift, clear sign<br>0000101100-1-100 | TABLE VII (Continued) | $\overline{n_0}$ | n <sub>1</sub> | n <sub>2</sub> | n <sub>3</sub> | Class : | Index | Mnemonic | Name Op Code | |------------------|----------------|----------------|----------------|---------|-------|----------|------------------------------------| | 1 | | e <sub>0</sub> | е <sub>5</sub> | s/R | 14 | BLS | B left shift<br>0000101000-1-000 | | 1 | <del>-</del> | e <sub>0</sub> | e <sub>1</sub> | S/R | 15 | BRS | Bright shift 0000101001-1-001 | | 0 | 2 | ъ <sub>4</sub> | b <sub>8</sub> | IRI | 16 | CAX | Copy A to X<br>1000001111100001 | | 0 | 3 | b <sub>4</sub> | b <sub>8</sub> | IRI | 17 | CAY | Copy A to Y 10000011111111100 | | - | | | <b>-</b> , | BIMI | 18 | CBS | Clear bits<br>10001011111111100 | | - | <b>-</b> 1 | - | - | BYMI | 19 | CBT | Compare bytes<br>1000101111110110 | | 1 | 2 | <sup>b</sup> 4 | <sup>b</sup> 8 | IRI | 20 | CBX | Copy B to X 1000101111101001 | | 1 | 3 | <sup>b</sup> 4 | b <sub>8</sub> | IRI | 21 | CBY | Copy B to Y 1000101111101001 | | 0 | - | f <sub>0</sub> | - | A/S | 22 | CCA | Clear and complement A 00000111 | | 1 | _ | f <sub>0</sub> | - | A/S | 23 | ССВ | Clear and complement B 00001111 | | | - | f <sub>7</sub> | | A/S | 24 | CCE | Clear and complement E 0000-111 | | 0 | | f <sub>2</sub> | - | A/S | 25 | CLA | Clear A 00000101 | | 1 | - | f <sub>2</sub> | • | A/S | 26 | CLB | Clear B | | - | 0 | d <sub>0</sub> | <sup>d</sup> 2 | IOI | 27 | CLC | Clear control<br>100011-111 | | - | - | f <sub>5</sub> | - | A/S | 28 | CLE | Clear E 0000-101 | | - | 0 | d <sub>0</sub> | d <sub>6</sub> | IOI | 29 | CLF | Clear flag<br>1000-11001 | | - | - | <b>-</b> . | | IOI | 30 | CLO | Clear overflow<br>1000011001000001 | TABLE VII (Continued) | $\overline{n_0}$ | n <sub>1</sub> | n <sub>2</sub> | <sup>n</sup> 3 | Class | Index | Mnemonic | Name | Op Code | |------------------|----------------|----------------|------------------|-------|-------|----------|-----------------------|---------------------------------| | 1 | _ | B <sub>4</sub> | В9 | IRI | 47 | DSY | Decrement Y an | d skip if zero<br>0101111111001 | | 0 | 0 | e <sub>0</sub> | e <sub>7</sub> | S/R | 48 | ELA | Rotate E left<br>000 | with A<br>0001110-1-110 | | 1 | 0 | е <sub>0</sub> | e <sub>7</sub> | S/R | 49 | ELB | Rotate E left<br>000 | with B<br>0101110-1-110 | | 0 | 1 | e <sub>0</sub> | e <sub>7</sub> | S/R | 50 | ERA | Rotate E right | with A<br>0001101-1-101 | | 1 | 0 | e <sub>0</sub> | e <sub>7</sub> , | S/R | 51 | ERB | Rotate E right | with B<br>0101101-1-1-1 | | - | <b>-</b> . | - | - | FPI | 52 | FAD | Floating point | add<br>0101000000000 | | - | | - | _ | FPI | 53 | FDV | Floating point | divide<br>0101000110000 | | - | - | - | - | FPI | 54 | FIX | Floating point | to integer<br>0101001000000 | | - | - | | | FPI | 55 | FLT | Integer to flo | ating point<br>0101001010000 | | <b>-</b> | - | - | - | FPI | 56 | FMP | Floating point | multiply<br>0101000100000 | | - | • | - | _ | FPI | 57 | FSB | Floating point | subtract<br>0101000010000 | | | 0 | $^{d}_{0}$ | <sup>d</sup> 11 | IOI | 58 | HLT | Halt 100 | 0-1-000 | | - | - | . — | - | A/S | 59 | INA | Increment A 000 | 0011 | | - | Servi | - | | A/S | 60 | INR | Increment B 000 | 0111 | | 1 | 0 | <sup>a</sup> 0 | a <sub>2</sub> | MRI | 61. | IOR | "Inclusive OR"<br>-01 | to A<br>1 | | 0 | 0 | b <sub>4</sub> | <sup>ъ</sup> 9 | IRI | 62 | ISX | Increment X an | d skip if zero<br>0101111110000 | | 0 | 1 | <sup>b</sup> 4 | <sup>b</sup> 9 | IRI | 63 | ISY | . Increment Y an | d skip if zero<br>0101111111000 | TABLE VII (Continued) | n <sub>0_</sub> | n1_ | n <sub>2</sub> | n <sub>3</sub> | Class | Index | Mnemonic | Name Op Code | |-----------------|-----|----------------|----------------|-------|-------|----------|-----------------------------------------------| | - | - | - | - | A/S | 31 | CMA | Complement A 00000110 | | - | - | - | _ | A/S | 32 | CMB | Complement B 00001110 | | | | f6 | | A/S | 33 | CME | Compare E 0000-110 | | | | - | - | WMI | 34 | CMW | Compare words<br>1000101111111110 | | 0 | - | <sup>a</sup> 0 | <sup>a</sup> 8 | MRI | 35 | CPA | Compare to A<br>-1010 | | 1 | | <sup>a</sup> 0 | a <sub>8</sub> | MRI | 36 | СРВ | Compare to B -1011 | | 2 | 0 | <sup>b</sup> 4 | <sup>b</sup> 8 | IRI | 37 | CXA | Copy X to A 1000001111100100 | | 2 | 1 | ъ <sub>4</sub> | Ъ8 | IRI | 38 | CXB | Copy X to B<br>1000101111100100 | | 3 | 0 | b <sub>4</sub> | ъ <sub>8</sub> | IRI | 39 | CYA | Copy Y to A<br>1000001111101100 | | 3 | 1 | <sup>b</sup> 4 | <sup>b</sup> 8 | IRI | 40 | СУВ | Copy Y to B<br>1000101111101100 | | - | - | - | - 1 | EAMR | 41 | DIV | Divide 100000010000 | | - | - | - | - | DMI | 42 | DJP | Disable mem and jump<br>1000101111011010 | | | - | num. | - | DMI | 43 | DJS | Disable mem and jump to sub-<br>routine | | | | | | | | | 1000101111011011 | | - | | - | <b>-</b> | EAMR | 44 | DLD | Double load 100010001000 | | - | - | · ••• | | EAMR | 45 | DST | Double store<br>100010010000 | | 1 | 0 | ъ <sub>4</sub> | <sup>b</sup> 9 | IRI | 46 | DSX | Decrement X and skip if zero 1000101111110001 | TABLE VII (Continued) | $\overline{n_0}$ | <sup>n</sup> 1 | n <sub>2</sub> | n <sub>3</sub> | Class | Index | Mnemonic | Name Op Code | |------------------|----------------|----------------|-----------------|-------|-------|----------|-------------------------------------------| | | _ | <sup>a</sup> 0 | a <sub>9</sub> | MPI | 64 | ISZ | Increment and skip if zero | | _ | - | g <sub>0</sub> | <del>-</del> | JMPI | 65 | JLY | Jump and load Y<br>1000101111110010 | | 1 | - | <sup>a</sup> 1 | <sup>a</sup> 13 | MPI | 66 | JMP | Jump -0101 | | | <b>-</b> | g <sub>4</sub> | - | JMPI | 67 | JPY | Jump indexed by Y 10001011111111010 | | _ | - | - | - | DMI | 68 | JRS | Jump and store status<br>1000101111001101 | | 0 | - | <sup>a</sup> 1 | <sup>a</sup> 12 | MPI | 69 | JSB | Jump to subroutine -0011 | | 0 | 0 | <sup>b</sup> 0 | <sup>b</sup> 11 | IRI | 70 | LAX | Load A indexed by X 1000001111100010 | | 0 | 1 | <sup>b</sup> 0 | ь<br>11 | IRI | 71 | LAY | Load A indexed by Y 1000001111101010 | | - | - | - | _ | ВҮМІ | 72 | LBT | Load byte 1000101111110011 | | 1 | 0 | <sup>b</sup> 0 | ь<br>11 | IRI | 73 | LBX | Load B indexed by X 1000101111000010 | | 1 | 1 | <sup>ь</sup> 0 | <sup>b</sup> 11 | IRI | 74 | LBY | Load B indexed by Y 1000101111101010 | | 0 | 0 | a<br>1 | a<br>7 | MRI | 75 | LDA | Load A -1100 | | 1 | 0 | $\mathbf{a}_1$ | a <sub>7</sub> | MRI | 76 | LDB | Load B -1101 | | 0 | - | <sup>b</sup> 0 | b<br>12 | IRI | 77 | LDX | Load X from memory 1000101111100101 | | 1 | ryana . | <sup>b</sup> 0 | <sup>b</sup> 12 | IRI | 78 | LDY | Load Y from memory 1000101111101101 | | - | _ | - | - | DMI | 79 | LFA | *Load fence from A 1000001111010111 | | <b>-</b> · | - | _ | - | DMI | 80 | LFB | *Load fence from B 1000101111010111 | TABLE VI (Continued) | $\overline{n_0}$ | $n_1$ | n <sub>2</sub> | n <sub>3</sub> | Class | Index | Mnemonic | Name | Op Code | |------------------|----------|----------------|-----------------|-------|-------|----------|-------------------|------------------------------------------------| | 0 | 0 | d <sub>0</sub> | <sup>d</sup> 12 | IOI | 81 | LIA | Load into | A<br>100001-101 | | 1 | 0 | d <sub>0</sub> | <sup>d</sup> 12 | IOI | 82 | LIB | Load into | B<br>100011-101 | | - | | c <sub>2</sub> | - | EAR | 83 | LSL | Logical sh | ift left<br>10000000001 | | _ | | c <sub>3</sub> | - | EAR | 84 | LSR | Logical sh | ift right<br>100000100010 | | | - | ~ | <del>-</del> . | DMI | 85 | MBF | Move bytes | from alternate map 10001011111000011 | | water | *** | - | - | DMI | 86 | MBI | Move bytes | into alternate<br>1000101111000010 | | - | - | - | - | BMI | 87 | MBT | Move bytes | 1000101111110101 | | - | - | - | - | DMI | 88 | MBW | Move bytes | within alternate<br>1000101111000100 | | 0 | 0 | d <sub>0</sub> | <sup>d</sup> 13 | IOI | 89 | MIA | Merge into | A<br>100001-100 | | 1 | - | d <sub>0</sub> | <sup>d</sup> 13 | IOI | 90 | MIB | Merge into | B<br>100011-100 | | - | <b>-</b> | - | <b></b> | EAMR | 91 | MPY | Multiply | 10000001000 | | - | <b>-</b> | - | <b>-</b> | WMI | 92 | MVW | Move words | 1000101111111111 | | - | - | - | - | DMI | 93 | MWF | Move words | from alternate map 1000101111.000110 | | - | - | - | <del>-</del> | DMI | 94 | MWI | Move words | into alternate map<br>1000101111000101 | | - | - | - | - | DMI | 95 | MWW | Move words<br>map | within alternate | | _ ′ | - | - | - | S/R | 96 | NOP | No Operatio | 1000101111000111<br>on<br>00000000000000000000 | TABLE VII (Continued) | $\frac{1}{n}$ | n <sub>1</sub> | $n_2$ | n <sub>3</sub> | Class | Index | Mnemonic | Name Op Code | |---------------|----------------|----------------|-----------------|-------|-------|----------|-------------------------------------------------| | | L | | | | | Om t | | | 0 | | <sup>d</sup> 0 | <sup>d</sup> 20 | IOI | 97 | OTA | Output A 100001-110 | | - | 0 | $^{d}0$ | <sup>d</sup> 20 | IOI | 98 | ОТВ | Output B 100011-110 | | _ | | | _ ` | DMI | 99 | PAA | Load/store port A map per A 1000001111001010 | | - | - | - | - , | DMI | 100 | PAB | Load/store port A map per B<br>1000101111001010 | | - | - | | - | DMI | 101 | PBA | Load/store port B map per A<br>1000001111001011 | | - | | | _ | DMI | 102 | РВВ | Load/store port B map per B 10001-1111001011 | | 0 | 0 | e <sub>0</sub> | е <sub>9</sub> | S/R | 103 | RAL | Rotate A left 0000001010-1-010 | | 1 | 0 | е <sub>0</sub> | е <sub>9</sub> | s/R | 104 | RAR | Rotate A right 0000001011-1-011 | | 0 | 1 | e <sub>0</sub> | e <sub>9</sub> | s/R | 105 | RBL | Rotate B left<br>0000101010010010 | | 1 | 1 | е <sub>0</sub> | e <sub>9</sub> | s/R | 106 | RBR | Rotate B right 0000101011-1-011 | | _ | | c <sub>4</sub> | | EAR | 107 | RRL | Rotate left 10000000100 | | - | - | c <sub>5</sub> | - | EAR | 108 | RRR | Rotate right 100000100100 | | | - | - | _ | DMZ | 109 | RSA | Read status register into A<br>1000001111011000 | | - | - | - | | DMI | 110 | RSB | Read status register into B<br>1000101111011000 | | _ | - | - | | A/S | 111 | RSS | Reverse skip sense 0000-11 | | | - | <b>-</b> . | - | DMI | 112 | RVA | Real violation register into A 1000001111011001 | TABLE VII (Continued) | $\frac{n}{0}$ | n <sub>1</sub> | n 2 | <sup>n</sup> 3 | Class | Index | Mnemonic | Name Op Code | |---------------|----------------|---------------------|-----------------|-------|-------|----------|----------------------------------------------------------------| | | _ | | | DMI | 113 | RVB | Read violation register into B | | 0 | 0 | b <sub>0</sub> | <sup>b</sup> 14 | IRI | 114 | SAX | Store A indexed by X 1000001111100000 | | 0 | 0 | b <sub>0</sub> | b<br>14 | IRI | 115 | SAY | Store A indexed by Y 1000001111101000 | | - | - | - | - | BIMI | 116 | SBS | Set bits 1000101111111011 | | _ | | - | - | BYMI | 117 | SBŢ | Store type<br>1000101111110100 | | 1 | 0 | <sup>b</sup> 0 | b<br>14 | IRI | 118 | SBX | Store B indexed by X 1000101111100000 | | 1 | 1 | b <sub>0</sub> | <sup>b</sup> 14 | IRI | 119 | SBY | Store B indexed by Y 1000101111101000 | | - | | - | · <b>-</b> | A/S | 120 | SEZ | Skip if E is zero<br>0000-11 | | - | - | - | - | BYMI | 121 | SFB | Skip if flag clear<br>1000-10010 | | - | 0 | $^{\mathbf{d}}_{0}$ | d<br>14 | 101 | 122 | SFC | Skip if flag clear<br>1000-10011 | | <u>-</u> | 0 | d<br>0 | <sup>d</sup> 16 | 101 | 123 | SFS | Skip if flag set<br>1000-10011 | | - | <del>-</del> | - | - | DMI | 124 | SJP | Enable system map and jump 1000101000100000 | | - | - | - | - | DMI | 125 | SJS | Enable system map and jump<br>to subroutine<br>100010111101110 | | _ | - | | - | S/R | 126 | SLA | Skip if LSB of A is zero 000001 | | - | - | <b>-</b> '. | - | S/R | 127 | SLB | Skip if LSB of B is zero 0000101 | | - | 0 | d <sub>0</sub> | d<br>14 | IOI | 128 | SOC | Skip if overflow clear 100001-010000000 | TABLE VII (Continued) | $\frac{1}{n}$ 0 | n <sub>1</sub> | n <sub>2</sub> | n <sub>3</sub> ( | Class | Index | Mnemonic | Name Op Code | |-----------------|----------------|----------------|------------------|-------|-------|----------|----------------------------------------------------| | | 0 | d <sub>0</sub> | <sup>d</sup> 16 | IOI | 129 | SOS | Skip if overflow set<br>100001-011000001 | | - | - | | - | A/S | 130 | SSA | Skip if sign of A is zero 0000011 | | - | - | - | - | A/S | 131 | SSB | Skip if sign of B is zero 0000111 | | - | - | - | -<br>- | DMI | 132 | SSM | Store status register into memory 1000101111001100 | | 0 | 1 | a <sub>1</sub> | <sup>a</sup> 7 | MRI | 133 | STA | Store A -1110 | | 1 | 1 | a <sub>1</sub> | <sup>a</sup> 7 | MRI | 134 | STB | Store -1111 | | - | 0 | $^{d}_{0}$ | d <sub>18</sub> | IOI | 135 | TSTC | Set control 100001-111 | | - | 0 | <sup>d</sup> 0 | d <sub>19</sub> | IOI | 136 | STF | Set flag<br>1000-10001 | | - | 0 | $^{d}_{0}$ | d<br>19 | IOI | 137 | STO | Set overflow<br>1000010001000001 | | 0 | - | <sup>b</sup> 0 | b <sub>13</sub> | IRI | 138 | STX | Store X to memory 1000101111100011 | | 1 | - | <sup>b</sup> 0 | b <sub>13</sub> | IRI | 139 | STY | Store Y to memory 1000101111101011 | | - | - | - | - | DMI | 140 | SYA | Load/store system map per A 1000001111001000 | | - | - | - | · <u> </u> | DMI | 141 | SYB | Load/store system map per B 1000101111001000 | | <b>-</b> | - | - | - | A/S | 142 | SZA | Skip if A is zero 0000011- | | - | - | - , | - | A/S | 143 | SZB | Skip if B is zero 0000111- | | <b>-</b> . | -<br>- | | - | BYMI | 144 | TBS | Test bits 1000101111111101 | TABLE VII (Continued) | n <sub>0</sub> | n, | n <sub>2</sub> | n <sub>3</sub> | Class | Index | Mnemonic | Name Op Code | |----------------|----|----------------|-----------------|-------|-------|----------|---------------------------------------------------------------| | 0_ | 1 | 2 | 3 | | | | Name op code | | - , | | <u>.</u> | - | DMI | 145 | UJP | Enable user map and jump<br>to subroutine<br>1000101111011110 | | - | - | - | - | DMI | 146 | UJS | Enable user map and jump<br>to subroutine<br>1000101111011111 | | - | - | _ | - | DMI | 147 | USA | Load/store user map per A<br>1000001111001001 | | - | - | _ | - | DMI | 148 | USB | Load/store user map per B<br>1000101111001001 | | 0 | 0 | <sup>b</sup> 4 | <sup>b</sup> 15 | DMI | 149 | XAX | Exchange A and X<br>1000001111100111 | | 0 | 1 | <sup>b</sup> 4 | <sup>b</sup> 15 | IRI | 150 | XAY | Exchange A and X<br>1000001111101111 | | 1 | - | b <sub>4</sub> | ь <sub>15</sub> | IRI | 151 | XBX | Exchange B and X 1000101111100111 | | 1 | 1 | <sup>b</sup> 4 | <sup>ь</sup> 15 | IRI | 152 | XBY | Exchange B and Y 1000101111101111 | | _ | - | - | - | DMI | 153 | XCA | Cross compare A<br>1000001111010110 | | - | - | - | - | DMI | 154 | XCB | Cross Compare B<br>1000101111010110 | | - | - | <del>-</del> . | <del>-</del> | DMI | 155 | XLA | Cross load A<br>1000001111010100 | | - | - | - | <del>-</del> | DMI | 156 | XLB | Cross load B<br>1000101111010100 | | - | _ | - | - | DMI | 157 | XMA | Transfer maps internally per A | | | | | | | | | 1000101111010000 | | <del>.</del> | - | _ | - | DMI | 158 | XMB | Transfer maps internally per B 1000101111010010 | | - | - | <b>-</b> . | - | DMI | 159 | XMM | Transfer maps or memory 1000101111010000 | TABLE VII (Continued) | | | | | | 1110 | LL VII (OOMEIN | | |----------------|----------------------------|----------------|----------------|-------|-------|----------------|---------------------------------------------| | n <sub>0</sub> | <sup>n</sup> <sub>1.</sub> | n <sub>2</sub> | n <sub>3</sub> | Class | Index | Mnemonic | Name Op Code | | <u></u> | - , | _ | _ | DMI | 160 | XMS | Transfer maps sequentially 1000101111010001 | | - | - | - | | MPI | 161 | XOR | "Exclusive OR" to A -0100 | | - | | - | - | DMI | 162 | XSA | Cross Store A 1000001111010101 | | <b>-</b> | - | - | - | DMI | 163 | XSB | Cross store B<br>1000101111010101 | | 0 | 2 | ъ <sub>4</sub> | ъ <sub>8</sub> | IRI | 164 | CAX | Copy A to X<br>1000001111100001 | | - | 3 | b <sub>4</sub> | <sup>b</sup> 8 | IRI | 165 | CAY | Copy A to Y 1000001111101001 | | 1 | 2 | <sup>b</sup> 4 | ъ <sub>8</sub> | IRI | 166 | CBX | Copy B to X 1000101111100001 | | 1 | 3 | ь <sub>4</sub> | ь <sub>8</sub> | IRI | 167 | СВУ | Copy B to Y 1000101111101001 | <sup>\*</sup> Base page fence register Figure 5. Input/Output Interrupt Generator #### Instruction Execution The instruction execution starts at line 15. The effective address computation of MRI is performed at lines 16, 17, 18 and 19. Line 20 sets up the immediate value for EAR. Line 21 sets up I/O flag clear/hold information for IOI. Line 22-24 subdecodes the packed micro-instructions in A/S and S/R instructions. #### Interrupt Service Service. In case of any exception the bit (0 for exception, 1 for I/O interrupt) in the interrupt holder h is set (line 27). The interrupt service sequence is initiated if at least one interrupt is pending (line 28). The sequence consists of fetching a new instruction from one of the five fixed locations in memory. The interrupt vector address of the peripheral device is obtained from the six least significant bits of the T-bus. The element of h which caused the interrupt is reset. (a) WMI Instruction | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |---|------------|------------|------------|------------|------------|------------|------------|------------| | 0 | | | - | | | | | | | 1 | 140<br>SYA | 147<br>USA | 99<br>PAA | 101<br>PBA | - | | | | | 2 | | | 157<br>XMA | 155 | 153<br>XLA | 79<br>XSA | . XCA | LFA | | 3 | 109<br>RSA | 112<br>RVA | | | | ` | - | | | 4 | | | 86<br>MBI | 85<br>MBF | 88<br>MBW | 94<br>MWI | 93<br>MWF | 95<br>MWW | | 5 | 141<br>SYB | 148<br>USB | 100<br>PAB | 102<br>PBB | 132<br>SSM | 68<br>JRS | | | | 6 | 159<br>XMM | 160<br>XMS | | 158<br>XMB | 156<br>XLB | 163<br>XSB | 154<br>XCB | 80<br>LFB | | 7 | 110<br>RSB | 113<br>RVB | 42<br>DJP | 43<br>DJS | 124<br>SJP | 125<br>SJS | 145<br>UJP | 146<br>UJS | 5 D (b) DMSI Instruction | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | | | 15 | | |---|------------|-----------|-----------|------------|-----------|-----------|----------|------------|-----------|-----------|----|--|--|---|----|--| | 0 | 114<br>SAX | 16<br>CAX | 70<br>LAX | | 37<br>CXA | | | 149<br>XAX | | | | | | | | | | 1 | 115<br>SAY | 17<br>CAY | 71<br>LAY | | 39<br>CYA | | | 150<br>XAY | | | | | | _ | | | | 2 | 118<br>SBX | 20<br>CBX | 73<br>LBX | 138<br>STX | | | 3<br>ADX | 151<br>XBX | 62<br>ISX | 46<br>DSX | | | | | | | | 3 | 119<br>SBY | 21<br>CBY | 74<br>LBY | 139<br>STY | 40<br>CYB | 78<br>LDY | | 152<br>XBY | 63<br>ISY | 47<br>DSY | | | | | | | 6<sub>D</sub> # (c) IRI Instruction Figure 6. Instruction Decoding Matrices | | • | | | | | | | | | |---|---|----------|-------------------|-----------------|-----------|----------|-----------|-----------|------------| | | | | | | | | | | | | | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | | 0 | | 8<br>AND | 161<br>XOR | 61<br>IOR | 1<br>ADA | 35<br>CPA | 75<br>LDA | 133<br>STA | | 4 | 1 | | 69<br>JSB | 66<br>JMP | 64<br>ISZ | 2<br>ADB | 36<br>CPB | 76<br>LDB | 134<br>STB | | | | (d) | MRI In | 0<br>D<br>struc | tion | | | | | | | 0 | 0<br>JLY | 1<br>JPY | | | | | | | | | | (e) 3 | 2<br>D<br>JMPI In | struc | tion | | • | | | (f) BYMI Instruction 0 1 1 0 18 144 CBS TBS 1 116 SBS $^{3}D$ (g) BIMI Instruction | | 0 | 1 | 2 | 3 | |---|-----|-----|-----|-----| | | 52 | 57 | 56 | 53 | | 0 | FAD | FSB | FMP | FDV | | , | 54 | 55 | | | | 1 | FIX | FLT | | | 7<sub>D</sub> (h) FPI Instruction Figure 6. (Continued) | | 0 | | | | | 7 | | |---|---|------|-----|-----|--|---|---| | ^ | | 10 | 83 | 103 | | | 1 | | U | | ASL | LSL | RRL | | | | | 1 | | . 11 | 84 | 104 | | | 1 | | _ | | ASR | LSR | RRR | | | | (i) EAR Instruction | | 91 | 44 | |---|-----|-----| | 0 | MPY | DLD | | | 41 | 45 | | 1 | DIV | DST | | | - | 9 | (j) EAMR Instruction | | 0 | 1 | 2 | 3. | 4 | . 5 | . 6 | 7 | |---|-----|-----|-----|-----|------|-----|-----|-----| | | 58 | 136 | 122 | 123 | 89 | 81 | 97 | 27 | | 0 | HLT | STF | SFC | SFS | MIA | LIA | OTA | CLC | | 7 | 58 | 29 | 128 | 129 | . 90 | 82 | 98 | 135 | | т | HLT | CLF | SOC | sos | MIB | LIB | отв | STC | $\begin{array}{c} & 10\\ D \\ \end{array} \text{(k) IOI Instruction}$ | | 0 | 1 | 2 | 3 | |---|---|-----|-----|-----| | 0 | | 25 | 31 | 22 | | 0 | | CLA | CMA | CCA | | 7 | | 26 | 32 | 23 | | Т | | CLB | CMB | CCB | $\begin{array}{c} 11_{D} \\ \text{($\ell$) A/S Instruction} \end{array}$ | | 7 | 9 | 103 | 104 | -6 | 50 | 48 | 5 | |---|-----|-----------|-----|-----|-----------|-----------|-----------|-----------| | 0 | ALS | | • | | ALR | | ELA | ALF | | 1 | BLS | 15<br>BRS | | | 13<br>BLR | 51<br>ERB | 49<br>ELB | 12<br>BLF | 12<sub>D</sub> (m) S/R Instruction Figure 6. (Continued) # Input/output Interrupts The I/O interrupt generator (IOIG) system program, Figure 5, determine the presence of interrupt requests by peripheral devices and sets the bit in the interrupt holder, h, accordingly (line 1). The dwell at line 0 checks for interrupts on the device flag. The setting of any I/O device flag means an interrupt request from that I/O device. If a higher priority device has already gained control of the processor, the lower priority device cannot be served until the higher priority device has finished its service routine (lines 1, 2, and 3). #### Memory Access Routine The memory access (MAC) operation, Figure 7, fetches or stores a specified number of bytes from the memory at a given address. The general form of the operation is $MAC^{i}$ (j;1), where i specifies the device requesting access; j is a two-component vector specifying the address in memory (j<sub>o</sub>) and the type of operation (store; j<sub>1</sub> = 2; fetch:j<sub>1</sub> = f), respectively; and 1 specified the vector into/from which the accessed data are to be stored/fetched. The request for service is entered in the bus request vector r, and in the queue if it is empty (line 0). The program dwells at line 1 until i is recognized as the first nonzero entry in the queue. After the request has been honored, the entry in the request vector is blanked out (line 2). The parity error exception is noted (line 5), and entered in the exception vector e. If no exception occurs, a fetch (line 4) or store (line 7) is performed. # Address Computation Routine The address computation (ADC) operation, Figure 8, is used for effective address calculation of the operands. The general form for ADC is (m; g; k) where m is the mode of addressing (0 means direct, 1 means indirect), g is the primary address, and k is the effective address returned by the operation. Figure 7. Memory Access Operation Figure 8. Address Computation Operation # Instruction Execution Routine At the entry point EXEC, Figure 9, the routine for an instruction is determined by $n_2$ (line $a_0$ ). Execution involves setting up condition codes (if necessary) after the execution. Lines $$a_1 - a_{13}$$ All MRI instructions are executed here. AND, IOR, XOR ADA, ADB, CPA, CPB, and ISZ are entered at line $a_1$ to get data from memory. STA, STB, LDA, and LDB are entered at line $a_2$ . All MRI instructions are diverged at line $a_2$ and enter their own routine. The "Exit" here means go back to PROC; the outgoing arrow at the right side of the line also indicates return to PROC if the arrow does not direct to any other line. This is true not only here, but also in any other line of the EXEC routine. Figure 9. EXEC Routine. | | | | , | |----------|---------------------------------------------------------------------------------|-----------------|----------------| | CXA,CXB | $(X,Y)_{n_0} \leftarrow u$ | ъ <sub>7</sub> | <del>=</del> | | CBX, CAX | $(A,B,X,Y)_{n_1} \leftarrow (A,B,X,Y)_{n_0}$ | b <sub>8</sub> | > | | CBY, CAY | , , , , , , , , , , , , , , , , , , , | | | | CYA, CYB | | | | | ISX,DSX | $0: (X,Y)_{n_1} \leftarrow \tau(1,-1)_{n_0} + \iota(X,Y)_{n_1}$ | ь <sub>9</sub> | <del>±</del> > | | ISY,DSY | $n_1$ $n_0$ $n_1$ | . 9 | | | | P ← ⊤1 + ⊥P | b <sub>10</sub> | > | | LAX,LAY | MAC <sup>1</sup> (a + (X,Y)n <sub>1</sub> , f; (A,B) <sub>n<sub>0</sub></sub> ) | ь<br>11 | | | LBX, LBY | "0 | 11 | - | | LDX,LDY | $(X,Y)_{n_0} \leftarrow C$ | b<br>12 | | | STX,STY | $MAC^{1} (a,s,(X,Y)_{n_{0}})$ | ь<br>13 | | | SAX, SAY | MAC $^{1}(b+1(X,Y)_{n_{1}}, s; (A,B)_{n_{0}})$ | ь<br>14 | | | SBX,SBY | | | | | XAX,XAY | $C \leftarrow (A,B)_{n_0}$ | b <sub>15</sub> | | | XBX,XBY | $(A,B)_{n_0} \leftarrow (X,Y)_{n_1}$ | <sup>b</sup> 16 | | | | $(X,Y)_{n} \leftarrow C$ | ъ <sub>17</sub> | | | ASL | $\omega^{31}/(B, A) \leftarrow a \circ \omega^{31}/(B, A)$ | c <sub>0</sub> | > | | ASR | $\omega^{31}/(B, A) \leftarrow (\varepsilon(31), \alpha^{a}(31))_{B_0}$ | | | | , | $v (a \circ \omega^{31}/(B, A))$ | <sup>c</sup> 1 | | | LSL | B, A ← a o (B, A) | c <sub>2</sub> | | | LSR | B, A ← a o (B, A) | c <sub>3</sub> | | | RRL | B, A ← a ↑ (B, A) | c <sub>4</sub> | | | RRR | B, A ← a + (B, A) | <sup>c</sup> 5 | | Figure 9. (Continued) Figure 9. (Continued) Figure 9. (Continued) Figure 9. (Continued) Figure 9. (Continued) Figure 9 (Continued) Figure 9. (Continued) Lines $b_0^{-b}$ 17 All IRI instructions are executed here. ADX, ADY, LDX, LDY, STX, and STY refer to certain memory locations whose addresses are defined in the word following the instruction word; thus some memory access and effective address computation tasks must be done(60-63) prior to the execution of the instructions. All the other instructions of IRI do not require those tasks and enter the routine at line $b_4$ to skip the unnecessary steps. Lines C<sub>0</sub>-C<sub>5</sub> The EAR instruction sets are executed here. Each instruction enters at a different line. Lines d<sub>0</sub>-d<sub>21</sub> All the IOI instructions are executed here. The I/O devices are interfaced with the processor by these instructions; symbols V, F, and Z are used here to represent the control bits, I/O flag bits, and data buffers of all the I/O devices. Each indexed symbol refers to a specific device. Lines e<sub>0</sub>-e<sub>18</sub> All the S/R instructions are executed here. Each S/R instruction consists of four microinstructions. Each microinstruction is chosen from its own microinstruction set. The first microinstruction set is the same as the fourth microinstruction set for S/R instructions. The instruction execution is divided into three parts. The first part (lines $e_0^-e_{12}^-$ ) executes the first microinstruction, the second part (lines 13-14) executes the second microinstruction, and the third part (lines 15-17) executes the third microinstruction. The fourth microinstruction is executed in the first part after the previous three microinstructions are all executed. Every S/R instruction must go through these four steps to complete the instruction execution. All the A/S instructions are executed here. Each A/S instruction consists of 8 microinstructions. Thus the instruction execution is divided into 8 parts, each part executing one microinstruction. Every A/S instruction must go through these 8 parts to complete the instruction execution. Lines $$g_0 - g_5$$ The JUMP instructions JLY and JPY are executed here. A memory access must be made to get the destination address of the JUMP instruction. Lines $$h_0 - h_{23}$$ All the EAMR instructions are executed here. Each of the four EAMR instructions requires two words of memory: one for the instruction code and one for the operand address. Thus at line $\mathbf{h}_0$ , the second memory word (operand address) is incremented by 1 to point to the next instruction. The overflow bit is set when the DIV instruction is executed if the divisor is zero or too small. In the former case (division by zero), the division will not be attempted and the B- and A-register contents will be unchanged except that a negative quantity will be made positive. In the latter case (divisor too small), the execution will be attempted with unpredictable results left in the B-and A-registers. Lines $i_0 - i_{24}$ All the FPI instructions are executed here. Four of the FPI instructions are floating point arithmetic instructions which require two words of memory: one for the instruction code and one for the operand address. Since a full 15 bits are available for the operand address, these instructions can directly address any location in memory. The execution of WMI, BIMI, BYMI, and DMI instructions is not included in the APL description here because they are not used and have nothing to do with this paper. # Microprogramming # Conventional Control Section In a conventional computer control section, the functions performed by the instruction set determine the specified hardware design. The major advantage of this specially designed hardware is speed of instruction execution. The major disadvantage is the loss of flexibility for special applications or for enhancements. Any changes and additions to existing capabilities require changes and additions to hardware components. This is no problem for a conventional computer is there are no new machine instructions required. "The hardware has been designed to minimize timing for the instruction set" (6). However, a computer manufacturer rarely produces an instruction set that meets the requirements of all potential users. "Hence, the manufacturer must either focus his attention on one group of users or widen his scope and generalize the hardware design to meet the needs of a number of user groups. In the latter case, the user must modify his discipline to some extent to meet the limitations of his hardware" (6). # Microprogrammed Control Section "In the microprogrammed computer, all distinct logical functions are separated from the sequence in which those functions are performed" (6). Thus, hardware redundancy is reduced. The control store holds the microinstruction which defines the logical functions. Each machine instruction in Main Memory is performed by a sequence of microinstructions in Control Store. This sequence of microinstructions called a microprogram and is often referred as firmware. Software can be executed much faster with the application of microprogramming. This speed is achieved by two factors: - The memory access time of Control Store is less than that of Main Memory. - 2. The microinstruction has more flexibility than the normal machine instruction. In fact, the HP21MX Control Store where microinstruction reside, cycles more than twice as fast as Main Memory where normal machine instructions reside. In addition, microinstruction have the ability to access many internal registers and some logical functions that Main Memory programs do not have. For example, the HP21MX floating point software subroutines were identified as very time consuming. They were microprogrammed by Hewlett-Packard and made available in ROM to users. Implementation of floating point firmware requires no change to user programs. The microprogrammed floating point instructions run about 20 times faster than the corresponding software subroutines. As in the floating point microprogram, the user can study his software, determine the most time consuming function performed, and then microprogram these functions, that is, execute them in control store using a single memory instruction instead of a sequence of Main Memory instructions. Any software that uses these microprogrammed functions will execute at a higher speed. # The Microprogrammable Computer Functionally, a computer consists of four major sections: Control Main Memory Input and Output Arithmetic and Logic Each section executes under the direction of the control section by means of a microprogram. The control section reads the user's program stored in Main Memory and directs the appropriate hardware in each of the other sections. ### Control Section The control section fetches an instruction from a certain location in memory, which is specified by the Memory Register (MR), and stores it into the Instruction Register (IR), as shown in Figure 10. An Figure 10. A Microprogram Implemention of One Macroprogram Instruction. appropriate microprogram is determined by the IR. Conceptually, each program instruction in Main Memory is a jump to a microprogrammed routine which resides in Control Store. The storage area for those microprograms is Control Store which may be either a Read Only Memory (ROM) or Writable Control Store (WCS). The control section that executes microprograms from ROM is referred as a Control Processor. #### The Control Processor A microprogram in the Control Processor is in command of the computer at all times. A microprogram takes program instructions from Main Memory and stores them into the IR. The upper eight bits of the IR determine the microprogram address within one of the following groups: Basic instruction set Extended instruction group Floating point instruction group User microprogram group The basic instruction set microprogram can be regarded as a supervisor microprogram that determines when a user microprogram is called and then passes control to the user microprogram. When a microprogram has run to completion, it returns to location 0 in Control Store (basic instruction set), returning control to the supervisor microprogram, after which the next instruction is fetched from Main Memory and stored into the IR. Successive microinstruction address are determined in the following way. The ROM Address Register (RAR) is incremented at the start of execution of each microinstruction. When a jump is executed, the RAR is loaded with the jump target address. When a jump to a subroutine is executed, the RAR is stored into the Save Register. When a return from a subroutine is executed (RTN), the Save Register contents are transferred into RAR and the Save Register is cleared. Thus at the completion of execution of each microinstruction, the RAR holds the address of the next microinstruction. The central data transfer path is the S-bus. The contents of all registers except the following can be directed onto the S-bus: L-register, RAR, SAVE Register, Extend Register, and the Overflow Register. The following registers can receive data from the S-bus: M-Register, T-Register, L-Register, Counter-Register, Display-Register, Display Indicator, and Instruction Register. The T-but receives data only from the Rotate/Shifter (R/S) but can pass data to the following registers: A-Register, B-Register, Scratch Pad Register ( $S_1$ through $S_{12}$ ), X-Register, Y-Register, P-Register, and S-Register, (Front Panel Switch Register). The I/O-bus serves to transfer data to and from external devices under program control. In the functional block diagram (Appendix A) all the data paths are shown by the arrows. For example, the B-Register contents can be sent to S-bus and hence to the M-Register. However, the contents of the B-Register cannot be sent to S12 (Scratch Pad 12) without passing through the ALU. #### Main Memory The M-register is a 15-bit register which holds memory addresses for reading from or writing into Main Memory. Upon storing from the M-Register, bit 15 is clear (0). The T-Register or transfer register holds the data being transferred to or from memory. The contents of both of these registers are transferred to and from the -bus. Four loader ROMS, selectable by Instruction Register bits 15 and 14, can each contain a 64-word Main Memory program which may be loaded into Main Memory and used to load Main Memory from a peripheral device or to perform any other function desired by the user. Two flags are associated with memory: the A-Register Addressable Flag (AAF) and the B-Register Addressable Flag (BAF). These flags are required to allow the A- and B-registers to be addressed as locations 0 and 1, respectively, of Main Memory. ## Input and Output The Central Interrupt Register (CIR) is a 6-bit register associated with the I/O interrupt circuitry. It is loaded with the select code of the interrupting device under program control and passed to the S-bus. Whenever the CIR is loaded, and Interrupt Acknowledge (IAK) signal is issued to the I/O device. The I/O bus transfers data to and from external devices. Two flags are associated with I/O: the interrupt pending flag and the I/O skip condition met flag. The Interrupt Enable Register is used to disable or enable the recognition of all interrupts, except Memory protect, parity, and power failure interrupts. ## Arithmetic and Logic Section This section consists of the Arithmetic and Logic Unit (ALU), the twenty-two Rotate/Shifter (R/S) registers, and six flags. The ALU and R/S are the only units that execute functional modifications on the data. The ALU receives input from the S-bus and from the L-register (Latch Register). Output from the ALU goes to the R/S which places its output on the T-bus. Output from the ALU and R/S can be stored in one of the following registers via the T-bus: A-Register, B-Register, Scratch Pad Registers ( $S_1$ through $S_{12}$ ), X-Register, Y-Register, P-Register, and S-Register. Recall that the P-register holds the macroprogram (main memory) address. The P-register must be under control of the microprogram which must insure that it contains the proper address after the microprogram is complete. When the microprogram is complete, the resulting P-Register value is the address of the next macroinstruction to be executed. Note that the Basic Instruction Set fetch routine (at Control Store address 0) automatically increments the P-Register after the macroinstruction is fetched. Thus for one-word user macroinstruction function codes, no further incrementing of the P-Register is necessary in the user microprogram. The S-Register is reserved for internal storage of the Front Panel Switch Register. Note that all of those registers can also be sent along the S-bus for storage into memory, passage to an external device, or input to the ALU. The Extend Register is a one-bit register used in shift operations to link the A- and B-Registers or to indicate a "carry" arithmetic result out of the A- or B-Registers. The overflow is a one-bit register used to indicate an arithmetic overflow from the ALU. These two registers can also be used as flags. # Implementation of a Polynomial Algorithm in the HP21MX Computer The four tasks which are illustrated in Chapter I are performed in this chapter. One of them is to program the polynomial algorithm in Hp 21 assembler language for evaluating the sine function. The other task does the same thing but uses a microprogram instead of the program coded in assembler language. The particular polynomial algorithm used for evaluating sine functions has been determined in Chapter II and is shown as follows: $$\sin x = c_1 x + c_3 x^3 + c_5 x^5$$ $$c_1 = 1.5706268$$ $$c_3 = -0.6432292$$ $$c_5 = 0.0727105$$ $$-1 \le x \le 1$$ (4.1) where For evaluating the sine of an angle $\theta$ , x is substituted with $2\theta/\Pi$ in Eq. (4.1); then sin $\theta$ can be computed by $$\sin \theta = c_1 \left(\frac{20}{\pi}\right) + c_3 \left(\frac{2\theta}{\pi}\right)^3 + c_5 \left(\frac{2\theta}{\pi}\right)^5$$ In order to reduce the execution time when implemented this algorithm in the computer, Eq. (4.1) can be factored as follows: $$\sin\frac{\pi}{2} \mathbf{x} = x(c_1 + x^2(c_3 + c_5 x^2)) \tag{4.2}$$ Although Eq. (4.1) and Eq. (4.2) give the same result in computation, they require a different number of multiplications. Inspection of Eq. (4.1) shows that the number of multiplications required is 11, while the number of multiplications required by Eq. (4.2) is 7. As mentioned in Chapter I, the multiplication function is one of the most time-consuming functions. Thus Eq. (4.2) definitely is more efficient than Eq. (4.1) when implemented in the computer. For the reason mentioned above, Eq. (4.2) is used for both the microprogram and the program coded in assembly language. The results of these two implementations are listed in Tables VIII and IX. The program listings are listed in Appendix B. # Implementation of the Cordic Algorithm on the HP21MX Computer The Cordic algorithm has been introduced in Chapter II. To use it for evaluation of the sine function, the value selected for n is a function of the desired computing accuracy. Theoretically, the larger the value of n is the more accurate the result. Actually, it is impossible to represent a number to any degree of accuracy in any computer because the accuracy of all computers is limited by the number of bits in a word. In the HP21MX computer, there are 16 bits in a word. When the Cordic algorithm is used to evaluate the sine function, the value of n not only determines the accuracy of the result, but also affects the execution time of the program. There is a trade-off between accuracy and execution time; i.e., when n increases, the accuracy is increased as is the execution time. In order to get the greatest accuracy and the least execution time, the optimum value of n must be found. As discussed in Chapter II, a set of ATR constants, $\alpha_4$ , i=1,...,n,, can be obtained from Eq. (4.3). $$\alpha_{i} = \tan^{-1} 2^{-(i-2)} \text{ for } 2 \le i \le n$$ (4.3) TABLE VIII POLYNOMIAL METHOD IMPLEMENTATION RESULTS (ASSEMBLY LANGUAGE) OF EVALUATING THE SINE FUNCTION | ` . | | | |----------------|-----------|--------------------------| | Angle(Radians) | Sin | Execution Time(Mili-Sec) | | -1.5 | -0.997558 | 0.081 | | -1.4 | -0,985351 | 0.081 | | -1.3 | -0.963378 | 0.081 | | -1.2 | -0.932128 | 0.081 | | -1.1 | -0.891357 | 0.081 | | -1.0 | -0.841552 | 0.081 | | -0.9 | -0.783447 | 0.081 | | -0.8 | -0.717285 | 0.081 | | -0.7 | -0.644287 | 0.081 | | -0.6 | -0.564697 | 0.081 | | -0.5 | -0.479492 | 0.081 | | -0.4 | -0.389404 | 0.081 | | -0.3 | -0.295654 | 0.081 | | -0.2 | -0.198730 | 0.081 | | -0.1 | -0.099853 | 0.081 | | 0.0 | 0.0 | 0.081 | | 0.1 | 0.099609 | 0.081 | | 0.2 | 0.198486 | 0.081 | | 0.3 | 0.295410 | 0.081 | | 0.4 | 0.389160 | 0.081 | TABLE VIII (Continued) | * 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | ( | | | | | | | |-----------------------------------------|----------|--------------------------|--|--|--|--|--| | Angle(Radians) | Sin | Execution Time(Mili-Sec) | | | | | | | 0.5 | 0.564453 | 0.049 | | | | | | | 0.6 | 0.564453 | 0.049 | | | | | | | 0.7 | 0.644042 | 0.049 | | | | | | | 0.8 | 0.717041 | 0.049 | | | | | | | 0.9 | 0.783203 | 0.049 | | | | | | | 0.1 | 0.841308 | 0.049 | | | | | | | 1.1 | 0.891113 | 0.049 | | | | | | | 1.2 | 0.931884 | 0.049 | | | | | | | 1.3 | 0.963134 | 0.049 | | | | | | | 1.4 | 0.985107 | 0.049 | | | | | | | 1.5 | 0.997314 | 0.049 | | | | | | | | | | | | | | | TABLE IX POLYNOMIAL METHOD IMPLEMENTATION RESULTS (MICROPROGRAM) OF EVALUATING THE SINE FUNCTION | Angle(Radians) | Sin | Execution Time(Mili-Sec) | | | |----------------|-----------|--------------------------|--|--| | -1.5 | -0.997558 | 0.049 | | | | -1.4 | -0.984351 | 0.049 | | | | -1.3 | -0.963378 | 0.049 | | | | -1.2 | -0.932128 | 0.049 | | | | -1.1 | -0.891357 | 0.049 | | | | -1.0 | -0.841552 | 0.049 | | | | -0.9 | -0.783447 | 0.049 | | | | -0.8 | -0.717285 | 0.049 | | | | -0.7 | -0.644287 | 0.049 | | | | -0.6 | -0.564697 | 0.049 | | | | -0.5 | -0.479492 | 0.049 | | | | -0.4 | -0.389494 | 0.049 | | | | -0.3 | -0.295654 | 0.049 | | | | -0.2 | -0.198730 | 0.049 | | | | -0.1 | -0.099353 | 0.049 | | | | 0.0 | 0.0 | 0.049 | | | | 0.1 | 0.099609 | 0.049 | | | | 0.2 | 0.198486 | 0.049 | | | | 0.3 | 0.295410 | 0.049 | | | | 0.4 | 0.389160 | 0.049 | | | TABLE IX (Continued) | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|----------|--------------------------| | 0.5 | 0.479248 | 0.081 | | 0.6 | 0.564453 | 0.081 | | 0.7 | 0.644042 | 0.081 | | 0.8 | 0.717041 | 0.081 | | 0.9 | 0.783203 | 0.081 | | 1.0 | 0.841308 | 0.081 | | 1.1 | 0.891113 | 0.081 | | 1.2 | 0.931884 | 0.081 | | 1.3 | 0.963134 | 0.081 | | 1.4 | 0.985107 | 0.081 | | 1.5 | 0.997314 | 0.081 | | | | | When implementing the Cordic algorithm in the HP21MX computer, $\alpha_{\bf i}$ will be divided by 180° and then represented in 16 binary digits. For example, if $\alpha_1$ = 90°, then 90°/180° = 0.5 $_{10}$ = 0.40000 $_8$ . 040000 $_8$ will be stored in the computer. If Eq. (4.3) is used to find the ATR constants n=1 to n=16, the values of $\alpha_{\bf i}$ are: $\alpha_1$ = 040000, $\alpha_2$ = 020000, $\alpha_3$ = 011344, $\alpha_4$ = 004773, $\alpha_5$ = 002421, $\alpha_6$ = 001213, $\alpha_7$ = 000505, $\alpha_8$ = 000242, $\alpha_9$ = 0001212, $\alpha_{10}$ = 000050, $\alpha_{11}$ = 000024, $\alpha_{12}$ = 000012, $\alpha_{13}$ = 000005, $\alpha_{14}$ = 000002, $\alpha_{15}$ = 000001, $\alpha_{16}$ = 000000. Because the ATR constant is represented with a 16-bit word in the HP21MX computer, when n > 15, the constant will be too small to be represented. Thus the value 15 is the best choice for the value of n. This yields the most accurate result without excessive execution time. Once the value of n is determined, the value of k can be found as well. The formula to obtain the constant k is: $$k = 1+2$$ $1+2^{-2}$ $1+2^{-2(n-2)}$ (4.5) When the constant k is computed by Eq. (4.5) with n = 15, the result is: $$k = 1.646744$$ The original coordinate vector in the Cordic algorithm is: $$V = 1/k = 0.6072589$$ One critical problem occurs immediately when the Cordic algorithm is being implemented in the HP21MX computer. Review of the Cordic machine in Chapter III shows that the best feature of Cordic which speeds up computation is that it has three adder-subtractors which can operate simultaneously. In the HP21MX computer, although there are two registers (A and B) which can operate like an adder-subtractor in Cordic, they cannot operate simultaneously. Due to this hardware limitation, the only way to simulate these parallel adder-subtractor operations is to execute sequentially. The flowchart for the assembler program which simulates the Cordic algorithm in the HP21MX computer is shown in Figure 11. An AHPL description for the microprogram which emulates the Cordic algorithm in the HP21MX computer is shown in Figure 12. Both program listings are shown in Appendix B. The programming results for these two implementations are listed in Tables X and XI. ## Calculation of Execution Time To calculate the execution time of both the macroprogram and the microprogram, the Time Base Generator (TBG) and interrupt feature are used. The TBG generates an interrupt signal for a specified time interval; the CPU acknowledges the interrupt and forces the current computer program to suspend and transfer control to a service subroutine which records the number of times that the clock interrupt has occurred. At the end of program, the program execution time can be calculated from the following equation: $$T = \frac{N \times TI}{L}$$ where T = program execution time N = number of clock interrupts TI = interrupt time interval of Time Base Generator L = number of times that the program has been executed Figure 11. Cordic Algorithm Figure 11. (Continued) Figure 11. (Continued) 0 MR $$\leftarrow$$ P 1 P $\leftarrow$ T1+1P 2 MAC<sup>1</sup>(1MR,f;T) 3 X $\leftarrow$ T 4 MR $\leftarrow$ P 5 P $\leftarrow$ T1+1P 6 MAC<sup>1</sup>(1MR,f,T) 7 S7 $\leftarrow$ T 8 S6 $\leftarrow$ A 9 $\rightarrow$ (10,14)<sub>A0</sub> 10 S7 $\leftarrow$ S7 11 S7 $\leftarrow$ (16)T1+1S7 12 X $\leftarrow$ X 13 X $\leftarrow$ (16)T1+1X 14 Y $\leftarrow$ X 15 X $\leftarrow$ 8(16) 16 E,S6 $\leftarrow$ T(1S6)+(1S7) 17 S4 $\leftarrow$ S<sup>12,13,14</sup>(16) 18 S3 $\leftarrow$ 8(16) 19 S5 $\leftarrow$ X 20 L $\leftarrow$ Y 21 $\rightarrow$ 37 22 CTR $\leftarrow$ $\omega$ <sup>8</sup>/S3 23 B $\leftarrow$ X Figure 12. The AHPL Description for the Cordic Algorithm in Implementation in HP21MX Microprogram Figure 12. (Continued) 49 L $$\leftarrow$$ S5 50 E,Y $\leftarrow$ (17) $\top$ ( $\bot$ Y) $+2^{16}$ $-\bot$ L 51 L $\leftarrow$ S7 52 E,S6 $\leftarrow$ (17) $\top$ ( $\bot$ S6) $+\bot$ L 53 E,S4 $\leftarrow$ (17) $\top$ 1+ $\bot$ S4 54 $\rightarrow$ (57,55) $(v/S4)$ 55 E,S3 $\leftarrow$ (17) $\top$ ( $\bot$ S6) $+2^{16}$ $-1$ 56 $\rightarrow$ 22 57 RETURN TO MACROPRAM Figure 12. (Continued) TABLE X CORDIC ALGORITHM IMPLEMENTATION RESULTS (ASSEMBLY LANGUAGE) OF EVALUATING THE SINE FUNCTION | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|----------|--------------------------| | 0.0 | 0.000244 | 3.304 | | 0.1 | 0.099975 | 3.297 | | 0.2 | 0.198913 | 3.310 | | 0.3 | 0.295410 | 3.313 | | 0.4 | 0.389587 | 3.300 | | 0.5 | 0.479431 | 3.313 | | 0.6 | 0.564758 | 3.305 | | 0.7 | 0.644226 | 3.305 | | 0.8 | 0.717407 | 3.310 | | 0.9 | 0.783325 | 3.304 | | 1.0 | 0.841369 | 3.305 | | 1.1 | 0.891174 | 3.313 | | 1.2 | 0.932206 | 3.311 | | 1.3 | 0.963562 | 3.306 | | 1.4 | 0.985351 | 3.311 | | 1.5 | 0.997558 | 3.318 | | 1.6 | 0.999450 | 3.310 | | 1.7 | 0.991760 | 3.318 | | 1.8 | 0.973693 | 3.313 | | 1.9 | 0.946350 | 3.305 | TABLE X (Continued) | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|-----------|--------------------------| | 2.0 | 0.909301 | 3.316 | | 2.1 | 0.863220 | 3.320 | | 2.2 | 0.808654 | 3.312 | | 2.3 | 0.745666 | 3.310 | | 2.4 | 0.675476 | 3.312 | | 2.5 | 0.598510 | 3.314 | | 2.6 | 0.515686 | 3.324 | | 2.7 | 0.427795 | 3.311 | | 2.8 | 0.334716 | 3.313 | | 2.9 | 0.239074 | 3.321 | | 3.0 | 0.140869 | 3.311 | | 3.1 | 0.041564 | 3.316 | | 3.2 | -0.058654 | 3.304 | | 3.3 | -0.157592 | 3.311 | | 3.4 | -0.255798 | 3.320 | | 3.5 | -0.350646 | 3.317 | | 3.6 | -0.442016 | 3.320 | | 3.7 | -0.530090 | 3.327 | | 3.8 | -0.611999 | 3.311 | | 3.9 | -0.687622 | 3.314 | | 4.0 | -0.756713 | 3.329 | | 4.1 | -0.818054 | 3.302 | | 4.2 | -0.871520 | 3.327 | | 4.3 | -0.916320 | 3.321 | | 4.4 | -0.951599 | 3.311 | TABLE X (Continued) | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|-----------|--------------------------| | | | | | 4.5 | -0.977539 | 3.322 | | 4.6 | -0.999877 | 3.314 | | 4.7 | -0.999877 | 3.314 | | 4.8 | -0.996032 | 3.310 | | 4.9 | -0.982422 | 3,.312 | | 5.0 | -0.958801 | 3.314 | | 5.1 | -0.925901 | 3.311 | | 5.2 | -0.883422 | 3.314 | | 5.3 | -0.832153 | 3.308 | | 5.4 | -0.772583 | 3.304 | | 5.5 | -0.705505 | 3.309 | | 5.6 | -0.631530 | 3.303 | | 5.7 | -0.550659 | 3.309 | | 5.8 | -0.464599 | 3.310 | | 5.9 | -0.373840 | 3.305 | | 6.0 | -0.279541 | 3.319 | | 6.1 | -0.182312 | 3.314 | | 6.2 | -0.082885 | 3.312 | TABLE XI CORDIC ALGORITHM IMPLEMENTATION RESULTS (MICROPROGRAM) OF EVALUATING THE SINE FUNCTION | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|----------|--------------------------| | 0.0 | 0.000244 | 0.105 | | 0.1 | 0.099975 | 0.126 | | 0.2 | 0.198913 | 0.112 | | 0.3 | 0.295410 | 0.108 | | 0.4 | 0.389587 | 0.106 | | 0.5 | 0.479431 | 0.104 | | 0.6 | 0.564758 | 0.108 | | 0.7 | 0.644226 | 0.107 | | 0.8 | 0.717407 | 0.113 | | 0.9 | 0.783325 | 0.097 | | 1.0 | 0.841369 | 0.110 | | 1.1 | 0.891174 | 0.111 | | 1.2 | 0.932206 | 0.114 | | 1.3 | 0.963562 | 0.109 | | 1.4 | 0.985351 | 0.104 | | 15 | 0.997558 | 0.105 | | 1.6 | 0.999450 | 0.104 | | 1.7 | 0.991760 | 0.105 | | 1.8 | 0.973693 | 0.114 | | 1.9 | 0.946350 | 0.106 | TABLE XI (Continued) | Anolo (no di | C4 | Francisco Ministra | |----------------|-----------|--------------------------| | Angle(Radians) | Sin | Execution Time(Mili-Sec) | | 2.0 | 0.909301 | 0.111 | | 2.1 | 0.863220 | 0.105 | | 2.2 | 0.808654 | 0.106 | | 2.3 | 0.745666 | 0.105 | | 2.4 | 0.675476 | 0.116 | | 2.5 | 0.598510 | 0.111 | | 2.6 | 0.515686 | 0.107 | | 2.7 | 0.427795 | 0.116 | | 2.8 | 0.334716 | 0.107 | | 2.9 | 0.239074 | 0.114 | | 3.0 | 0.140869 | 0.102 | | 3.1 | 0.041564 | 0.103 | | 3.2 | -0.058654 | 0.101 | | 3.3 | -0.157592 | 0.105 | | 3.4 | -0.255798 | 0.106 | | 3.5 | -0.350646 | 0.112 | | 3.6 | -0.442016 | 0.110 | | 3.7 | -0.530090 | 0.105 | | 3.8 | -0.611999 | 0.109 | | 3.9 | -0.687622 | 0.097 | | 4.0 | -0.756713 | 0.108 | | 4.1 | -0.818054 | 0.112 | | 4.2 | -0.871520 | 0.107 | | 4.3 | -0.916320 | 0.107 | | 4.4 | -0.951599 | 0.111 | TABLE XI (Continued) | Angle(Radians) | Sin | Execution Time(Mili-Sec) | |----------------|-----------|--------------------------| | 4.5 | -0.977539 | 0.106 | | 4.6 | -0.993774 | 0.107 | | 4.7 | -0.999877 | 0.107 | | 4.8 | -0.996032 | 0.107 | | 4.9 | -0.982422 | 0.102 | | 5.0 | -0.982422 | 0.102 | | 5.1 | -0.925901 | 0.101 | | 5.2 | -0.883422 | 0.110 | | 5.3 | -0.832153 | 0.108 | | 5.4 | -0.772483 | 0.111 | | 5.5 | -0.705505 | 0.110 | | 5.6 | -0.631530 | 0.115 | | 5.7 | -0.550659 | 0.116 | | 5.8 | -0.464599 | 0.107 | | 5.9 | -0.373840 | 0.114 | | 6.0 | -0.279541 | 0.111 | | 6.1 | -0.182312 | 0.110 | | 6.2 | -0.082885 | 0.107 | #### CHAPTER V ### OTHER USES OF CORDIC The Cordic algorithm may also be applied in solving many other mathematic problems as well as being applied in the evaluation of the sine and cosine functions. Decimal to binary and binary to decimal conversion, arctangent function computation, fourier transformation, et.al., can be done by the Cordic algorithm—a different way from the conventional methods. Arctangent function computation and decimal to binary conversions are chosen in this chapter to demonstrate how the Cordic algorithm is applied to solve these problems. # Arctangent Algorithm This algorithm is obtained by reversing the sine and cosine algorithms. In this algorithm, the value V which equals Y/X is known (X and Y are components of a vector.) The vector is rotated with respect to the positive X-axis. The angle traversed is the angle whose tangent equals Y/X. ### Functional Description The VECTORING mode is used in this application. To illustrate the details of this algorithm, Figure 2 in Chapter III is referred to again. The value of v is checked before the initialization of the X- and Y-registers. If the value of v is greater than 1 then the Y-register is initialized with 1 and the X-register is initialized with v; otherwise the X-register is initialized with 1 and the Y-register is initialized with v. The Angle Register (A-register) is always initialized with 0. A sign digit of 0 in the Y-register establishes a v<sub>i</sub> of -1, which causes the top adder-subtractor to be set to subtract and the middle and bottom adder-subtractors to add. A sign digit of 1 has the opposite effect. The ATR constants are the same as those used in Chapter III. The VECTORING computing sequence as described in Table II is started. The angle whose tangent equals to v is taken from the A-register after the final computation step. ## Decimal to Binary Conversions in Cordic A technique is formulated for using the Cordic arithmetic unit to convert between angles expressed in binary fractions of a half revolution and angles expressed in degrees and minutes in the 8421-code. The Cordic decimal-to-binary conversion technique may be compared to a conventional conversion technique in which the 8421-code and binary arithmetic are utilized. The conventional conversion technique is based upon the 8421-code definition of the value of a decimal digit, N, located i placed to the left of the units position, as given by $$N \times 10^{i} = n_{4} (8 \times 10^{i}) + n_{3} (4 \times 10^{i}) + n_{2} (2 \times 10^{i}) + n_{1} (1 \times 10^{i})$$ (5.1) where $n_4$ , $n_3$ , $n_2$ , and $n_1$ are equal to zero or one. The constants $8 \times 10^{1}$ , $4 \times 10^{1}$ , $2 \times 10^{1}$ , and $1 \times 10^{1}$ , evaluated in binary for all values of i to be used, are required in the conversion. For example, $5^{\circ}$ in 8421-code is $$45^{\circ} = (0 \times 8 \times 10 + 1 \times 4 \times 10 + 0 \times 2 \times 10 + 0 \times 1 \times 10)$$ $$+ (0 \times 8 + 1 \times 4 + 0 \times 2 + 1 \times 1)$$ $$45^{\circ} = (0100), (0101).$$ For example, 86° can be written as $$86^{\circ} = (1 \times 8 \times 10 + 0 \times 4 \times 10 + 0 \times 2 \times 10 + 0 \times 1 \times 10) + (0 \times 8 + 1 \times 4 + 1 \times 2 + 0 \times 1)$$ $$86^{\circ} = (1000). (0110)$$ The conversion of a negative angle is accomplished in the same way, and the result is then complemented by subtracting the binary magnitude from zero. For example, -86° is (0111)(1010) which is the 2's complement of 86°. The binary value of $45^{\circ}$ as a fraction of half revolution is shown in Table XII. In Table XII at each step a binary constant is either added or not added, depending upon whether the 8421-code variable is 1 or 0, respectively. In order to use the Cordic principle, it is necessary either to add or to subtract a constant. The use of addition or subtraction is controlled by a code variable placed in the sign digit position of an arithmetic unit register. The problem of conversion by adding and subtracting constants is considered first. Subsequently, the method of properly positioning the code variables for control is presented. By analogy to the way in which a code variable of +1 is used to establish the addition of a constant, a variable of -1 is used to establish subtraction. Therefore, it is desired that a binary code with +1 and -1 variables be used to represent decimal angles in Cordic. For convenience, the desired code is called a + (plus-minus) code. TABLE XII THE CONVENTIONAL DECIMAL-TO-BINARY CONVERSION | Constants<br>Degree | Constants-Binary<br>Fraction of half<br>Revolution | 8421-<br>Code Variable | | Product<br>Term | | |---------------------|----------------------------------------------------|------------------------|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| | 8 x 10 | .01110010 | х | 0 | | 0.00000000 | | 4 x 10 | .00111001 | x | 1 \ | = | 0.00111001 | | 2 x 10 | .00011100 | x | 0 | = | 0.00000000 | | 1 x 10 | .00001011 | x · | 0 | = | 0.00000000 | | 8 | .00000110 | x | 0 | tion of the same o | 0.00000000 | | 4 | .00000011 | x | 1 | Marine. | 0.00000110 | | 2 | .00000011 | x | 0 | = | 0.00000000 | | 1 , | .00000001 | x | 1 | = | 0.0000001 | Accumulated sum = $2^{-2}$ half revolution = .01000000. The 8, 4, 2, 1 weights cannot be applied directly to a four-digit $\pm$ code because all possible sums of binary-weighted $\pm$ code digits are odd. Therefore, a transformation of the decimal digits 0, 1, ..., 9, into a set of ten odd integers is necessary. The set of ten odd integers -0, -7, ..., -1, +1, ..., +9 is selected. The equation transforming a decimal digit N, having one of the values, $0, 1, \ldots, 9$ , into a digit Y having one of the values $-9, -1, \ldots, +9$ is $$Y = 2N - 9$$ (5.2) The equation for the inverse transformation is $$N = \frac{1}{2}Y + \frac{9}{2} \tag{5.3}$$ Applying the factor of $\frac{1}{2}$ in (5.3) to the 8421-weight results in the $\pm$ code equation $$N = Y_4 \cdot 4 + Y_3 \cdot 2 + Y_2 \cdot 1 + Y_1 \cdot \frac{1}{2} + C$$ (5.4) where $Y_j = +1$ or -1 and $C = \frac{9}{2}$ . A factor of $10^i$ may be applied to each term in (5.4), as was done in (5.1), account for the position of the digit N. The pattern the $Y_j$ variables of the code of (5.4), with $C = \frac{9}{2}$ and with 0's used t · represent -1's, is identical to that of the Excess-3 code. Equation (5.4) can be applied to each digit position, and the constant term c $\times$ $10^{1}$ for all decimal digit positions is added in binary to the accumulated sum. As an example 45° will be converted from $\pm$ (excess-3) code to binary as follows: $$C_2 = \frac{9}{2} = 4.5$$ $$C_1 = \frac{9}{2} \times 10. = 45$$ $$C = C_1 + C_2 = 49.5 = total constant$$ Consequently the constant for 45° is 49.5. The + 1 code representation is $$5 + 3 = 8 = (1000)_2 = (+---)$$ $$4 + 3 = 7 = (0111)_{2} = (-+++)$$ Where each digit must be added to 3 for excess -3. The zero stands for minus one and one for plus one. Thus $$45^{\circ} = (-+++) (+---)$$ The complete conversion of 45° is shown in Table XIII. Where from equation (5.4) $$X = 4 \times 10^{1} Y_{4} + 2 \times 10^{1} Y_{3} + 1 \times 10^{1} Y_{2} + \frac{1}{2} \times 10^{1} Y_{1} + C$$ $$45^{\circ} = (40Y_{4} + 20Y_{3} + 10Y_{2} + 5Y_{1}) + (4Y_{4} + 2Y_{3} + 1Y_{2} + \frac{1}{2}Y_{1}) + C$$ $$45^{\circ} = (-40 + 20 + 10 + 5) + (4 - 2 - 1 - \frac{1}{2}) + 49.5^{\circ}$$ Successive digits of the $\pm$ code must control successive setting of the adder-subtractors in order for the proper sequence of additions and subtractions to occur as indicated in the previous table. The settings of the adder-subtractors during the conversion operation are established by the value of the sign digit located in the Y-register. In positioning the $\pm$ code digits for control, the technique of nonrestoring division is useful because successive quotient digits are TABLE XIII DECIMAL-TO-BINARY CONVERSIONS IN CORDIC | Constant<br>Degrees | Constant-Bainry<br>Fraction of Half<br>Revolution | <u>+</u> | Code | Product | Accumulated Sum | |---------------------|---------------------------------------------------|----------|--------|-------------------|-----------------| | 49.5 | .0100011001110 | (co | rrecti | on) .010001100110 | .010001100110 | | 40 | .001110001110 | x | -1 | =001110001110 | .000011011000 | | 20 | .000111000111 | x | +1 | = +.000111000111 | .001010011111 | | 10 | .0000011100100 | x | +1 | = +.000011100100 | .001110000011 | | 5 | .00001110010 | x | +1 | = +.000001110010 | .001111110101 | | 4 | .000001011011 | x | +1 | = +.00000101101 | .010001010000 | | 2 | .000000101110 | x | -1 | =000000101110 | .010000100010 | | 1 | .000000010111 | x | -1 | =000000010111 | .010000001011 | | 1/2 | .000000001011 | x | -1 | =00000001011 | .0100000000 | The accumulated sum = $2^{-2}$ half revolution = 0.010000000000 given by the sign of successive remainders. Dividing the number representing the $\pm$ code of the angle by 1 produces the signs of successive remainders. In Cordic this is accomplished as follows: - If the remainder is positive, subtract the divisor. If the remainder is negative, add the divisor. - 2) Shift the divisor one place to the right. - 3) Repeat 1 and 2. The positioning of digits of the $\pm$ code for 45° is illustrated by following the above rules as shown in Table XIV. In decimal-to-binary conversion, the <u>+</u> code for the desired angle is placed in the Y-register and the divisor of 1 is placed in the X-register. A sign digit of 0 in the Y-register establishes a Y<sub>1</sub> of -1, which causes the top adder-subtractor, Figure 13, to subtract and the bottom adder-subtractor to add. A sign digit of 1 has the opposite effect. The constant C in (5.4) is initially placed in the angle register and successive constants are introduced into the bottom adder-subtractor as shown in Figure 13. As one step of the division is taking place to establish the next setting of the adder-subtractors, a constant is being added or subtracted to modify the quantity in the angle register according to the sign digit in the Y-register at the beginning of the step. The binary angle is taken from the bottom adder-subtractor on the final computation step. TABLE XIV GENERATION OF $\pm$ CODE FOR 45° | | | | | Sign of Remainder | |------------|------|------|-----|-------------------| | (-+++) (+) | 0111 | 1000 | | | | sub | 1 | | | | | | 1111 | 1000 | _ ′ | | | add | 1 | | | | | | 0011 | 1000 | + | | | sub | 1 | | = | 0111 7 in excess | | | 0001 | 1000 | + | | | sub | 1 | | | | | | 0000 | 1000 | + | | | sub | | 1 | | | | | 0000 | 0000 | + | | | sub | | 1 | | | | | 1111 | 1100 | | | | add | | 1 | = | 1000 8 in excess | | | 1111 | 1110 | _ | | | add | | 1 | | | | | 1111 | 1111 | _ | | Figure 13. Implementation of $\pm$ Code to Binary Conversion. #### CHAPTER VI #### SUMMARY AND CONCLUSIONS The results of the programming tasks discussed in the previous chapters are shown in Tables VIII - XI. In order to compare the accuracy of the results obtained from each task, a set of standard sine function values is obtained. The result of each task is compared to these standard values and the accuracy is thus determined. For the convenience of further description, the four tasks which have been accomplished in Chapter IV are designated Task 1, Task 2, Task 3 and Task 4: - Task 1 polynomial method implemented in assembly coded program. - Task 2 polynomial method implemented in microcode. - Task 3 Cordic algorithm implemented in assembly coded program. - Task 4 Cordic algorithm implemented in microcode. Note that the sine values of Task 1 are identical to those of Task 2, while the sine values of Task 3 are identical to those of Task 4. Thus, only two sets of results are compared with the standard sine values, as shown in Tables XV and XVI. A cording to these tables, both tasks are accurate up to three decimal digits; in other words, all the tasks give about the same accuracy of sine values. The execution time of each taskis shown in Tables VIII -XI. By reviewing those tables it is found that Task 1 is the most time- TABLE XV THE COMPARISON BETWEEN THE CORDIC ALGORITHM IMPLEMENTATION RESULT AND THE STANDARD SINE VALUE | Angle(Radian) | Sin(Cordic) | Sin(Correct) | Error | |---------------|-------------|--------------|-----------| | 0.0 | 0.000244 | 0.0 | 0.000244 | | 0.1 | 0.099975 | 0.0998334 | 0.0001416 | | 0.2 | 0.198913 | 0.198669 | 0.000244 | | 0.3 | 0.295410 | 0.29552 | 0.00011 | | 0.4 | 0.389487 | 0.389418 | 0.000169 | | 0.5 | 0.479431 | 0.479425 | 0.000006 | | 0.6 | 0.564758 | 0.564642 | 0.000116 | | 0.7 | 0.644226 | 0.644218 | 0.000008 | | 0.8 | 0.717407 | 0.717356 | 0.000051 | | 0.9 | 0.783325 | 0.783327 | 0.000002 | | 10. | 0.841369 | 0.841471 | 0.000102 | | 1.1 | 0.891174 | 0.891207 | 0.000033 | | 1.2 | 0.932206 | 0.932039 | 0.000167 | | 1.3 | 0.963562 | 0.963558 | 0.000004 | | 1.4 | 0.985351 | 0.98545 | 0.000099 | | 1.5 | 0.997558 | 0.997495 | 0.0000063 | | 1.6 | 0.999450 | 0.999574 | 0.000124 | | 1.7 | 0.991760 | 0.991665 | 0.000095 | | 1.8 | 0.973693 | 0.973848 | 0.000155 | | 1.9 | 0.946350 | 0.9463 | 0.00005 | TABLE XV (Continued) | Angle(Radian) | Sin(Cordic) | Sin(Correct) | Error | |---------------|-------------|--------------|-----------| | 2.0 | 0.909301 | 0.909297 | 0.000004 | | 2.1 | 0.863220 | 0.863209 | 0.000011 | | 2.2 | 0.808654 | 0.808496 | 0.000158 | | 2.3 | 0.745666 | 0.745705 | 0.000039 | | 2.4 | 0.675476 | 0.675463 | 0.000039 | | 2.5 | 0.598510 | 0.598472 | 0.000013 | | 2.6 | 0.515686 | 0.515502 | 0.0000184 | | 2.7 | 0.427795 | 0.42738 | 0.000415 | | 2.8 | 0.334716 | 0.334988 | 0.000272 | | 2.9 | 0.239074 | 0.23925 | 0.000176 | | 3.0 | 0.140869 | 0.14112 | 0.000251 | | 3.1 | 0.041564 | 0.0415808 | 0.0000168 | | 3.2 | -0.058654 | -0.0583743 | 0.0002797 | | 3.3 | -0.157592 | -0.157746 | 0.000154 | | 3.4 | -0.255798 | -0.255541 | 0.000257 | | 3.5 | -0.350646 | -0.350783 | 0.000137 | | 3.6 | -0.442016 | -0.442521 | 0.000505 | | 3.7 | -0.530090 | -0.529836 | 0.000254 | | 3.8 | -0.611999 | -0.611858 | 0.000141 | | 3.9 | -0.687622 | -0.687766 | 0.000144 | | 4.0 | -0.756713 | -0.756802 | 0.000089 | | 4.1 | -0.818054 | -0.818277 | 0.000223 | | 4.2 | -0.871520 | -0.871576 | 0.000056 | | 4.3 | -0.916320 | -0.916166 | 0.000154 | | 4.4 | -0.951599 | -0.951602 | 0.000003 | TABLE XV (Continued) | Angle(Radian) | Sir(Cordic) | Sin(Correct) | Error | |---------------|-------------|--------------|-----------| | 4.5 | -0.977539 | -0.97753 | 0.000009 | | 4.6 | -0.993774 | -0.993691 | 0.000083 | | 4.7 | -0.999877 | -0.999923 | 0.000046 | | 4.8 | -0.996032 | -0.996165 | 0.000133 | | 4.9 | -0.982422 | -0.982453 | 0.00031 | | 5.0 | -0.958801 | -0.958924 | 0.000123 | | 5.1 | -0.024901 | -0.924815 | 0.000086 | | 5.2 | -0.883422 | -0.883455 | 0.000033 | | 5.3 | -0.832153 | -0.832267 | 0.000114 | | 5.4 | -0.772583 | -0.772765 | 0.000182 | | 5.5 | -0.705505 | -0.70554 | 0.000035 | | 5.6 | -0.631530 | -0.631267 | 0.000263 | | 5.7 | -0.550659 | -0.550686 | 0.0000027 | | 5.8 | -0.464599 | -0.464602 | 0.000003 | | 5.9 | -0.373840 | -0.373877 | 0.000037 | | 6.0 | -0.279541 | -0.279416 | 0.000125 | | 6.1 | -0.182312 | -0.182163 | 0.000149 | | 6.2 | -0.082885 | -0.0830896 | 0.0002046 | TABLE XVI THE COMPARISON BETWEEN THE POLYNOMIAL METHOD IMPLEMENTATION RESULT AND THE STANDARD SINE VALUE | Angle(Radian) | Sin(Cordic) | Sin(Correct) | Error | | |---------------|-------------|--------------|-----------|--| | -1.5 | -0.997558 | -0.997495 | 0.000063 | | | -1.4 | -0.985351 | -0.98545 | 0.000099 | | | -1.3 | -0.963378 | -0.963558 | 0.00018 | | | -1.2 | -0.932128 | -0.932039 | 0.000089 | | | -1.1 | -0.891357 | -0.891207 | 0.00015 | | | -1.0 | -0.841552 | -0.841471 | 0.000081 | | | -0.9 | -0.783447 | -0.783327 | 0.00012 | | | -0.8 | -0.717285 | -0.717356 | 0.000071 | | | -0.7 | -0.644287 | -0.644218 | 0.000069 | | | -0.6 | -0.564697 | -0.564642 | 0.000055 | | | -0.5 | -0.479425 | -0.479492 | 0.000067 | | | -0.4 | -0.389404 | -0.389418 | 0.000014 | | | -0.3 | -0.295654 | -0.29552 | 0.0001344 | | | -0.2 | -0.198730 | -0.198669 | 0.000061 | | | -0.1 | -0.099853 | -0.0998334 | 0.0000196 | | | 0.0 | 0.0 | 0.0 | 0.000000 | | | 0.1 | 0.099609 | 0.0998334 | 0.0001434 | | | 0.2 | 0.198486 | 0.198669 | 0.000183 | | | 0.3 | 0.295410 | 0.29552 | 0.00011 | | | 0.4 | 0.389160 | 0.389418 | 0.000258 | | | | | | | | TABLE XVI (Continued) | Angle(Radian) | Sin(Cordic) | Sin(Correct) | Error | |---------------|-------------|--------------|-----------| | 0.5 | 0.479248 | 0.479425 | 0.000177 | | 0.6 | 0.564453 | 0.564652 | 0.000112 | | 0.7 | 0.644042 | 0.644218 | 0.000176 | | 0.8 | 0.717041 | 0.717356 | 0.000315 | | 0.9 | 0.783203 | 0.783327 | 0.000124 | | 1.0 | 0.841308 | 0.841471 | 0.000163 | | 1.1 | 0.891113 | 0.891207 | 0.0000937 | | 1.2 | 0.931884 | 0.932039 | 0.0001546 | | 1.3 | 0.963134 | 0.963558 | 0.000424 | | 1.4 | 0.985107 | 0.98545 | 0.000343 | | 1.5 | 0.997314 | 0.997495 | 0.000181 | consuming task; Task 2 consumes less time; Task 3 consumes still less time; Task 4 consumes the least time of all. Task 1 and Task 2 are the same algorithm but implemented in different ways, so the sine values will be identical but the execution time may be different. The same applies to Task 3 and Task 4. The programming results in Chapter IV prove this assumption. Task 1 is performed in an assembly coded program, while Task 2 is performed in a microprogram. According to the description of the microprogramming in Chapter IV, the execution time of Task 2 should be less than that of Task 1. Similarly, Task 4 should have less execution time than Task 3. The programming results in Chapter IV also prove this assumption. The things that cannot be predicted before going to the computer are whether Task 1 or Task 3 will have less execution time, and whether Task 2 or Task 4 will have less execution time. However, we expect that Task 1 is faster than Task 3 and Task 2 is faster than Task 4. If this is true, it means we can improve the speed of evaluation of trigonometric functions by replacing the conventional polynomial method with the Cordic algorithm. Surprisingly, the programming results in Chapter IV indicate that the conventional polynomial method is faster than the Cordic algorithm for computing trigonometric functions. Although this is disappointing, it is possible to determine exactly how these results were effected. Although the Cordic algorithm eliminates the necessity of multiplication, some shifting still must be done. In the real Cordic machine, three registers (A,X,Y) can be shifted and added or subtracted simultaneously. When the Cordic algorithm is simulated in this general purpose machine the HP21MX, the shifting and adding or subtracting can only be done sequentially, because the arithmetic unit can only handle one arithmetic operation at a time. In addition to this, the result of shifting and adding must be stored, and then the arithmetic unit for shifting and adding/subtracting of other registers msut be released. After all three registers finish their shifting and adding/subtracting for the current cycle, the next cycle starts. So the shifting and adding/subtracting results of the first register in the previous cycle will be restored, and so on for the second register and thrid. Therefore, when the computer is running, a lot of storing and restoring is being performed, and this is very time-consuming. That is why Task 1 requires more execution time than Task 3. Task 2 implements the Cordic algorithm in a microcode, so it improves the speed of Task 1, but is still slower than Task 3 and Task 4. Task 4 is a microcode, and thus improving the speed of Task 3. Therefore, the conclusions are: - 1) The use of the Cordic algorithm for evaluating trigonometric functions without hardware extensions will be slower than using conventional polynomial methods; - When using a conventional polynomial method for evaluating the sine function, the microprogram will be two times faster than the assembly coded program; - 3) In order to use the Cordic algorithm to improve the speed of evaluation of trigonometric functions, a lot of hardware work must be done in the current HP21MX computer. With the suprising speed of development of the microprocessor today, it might be very easy to construct a microcomputer which has the features of both the general purpose computer and the Cordic computer in the near future. #### A SELECTION BIBLIOGRAPHY - (1) Gear, W. C. <u>Computer Organization And Programming</u>. New York: McGraw-Hill, 1969. - (2) Iverson, K. E. A <u>Programming Language</u>. New York: John Wiley, 1962. - (3) La Lyusternik, O., A. Chervonenkis, and A. R. Yanpol'skii. Handbook for Computing Elementary Functions. New York: Pergamon Press, 1965. - (4) Hayward, J. T. and J. P. Wong, Jr. Approximations For Digital Computers. Princeton, New Jersey: Princeton University Press. - (5) <u>HP21MX Computer Series Reference Manual</u>. Cupertino, California: Hewlett Packard Company, 1974. - (6) HP Microprogramming 21MX Computers Operating and Reference Manual. Cupertino, California: Hewlett Packard Company, 1974. - (7) A Pocket Guide to Hewlett-Packard Computers. Cupertino, California: Hewlett Packard Company, 1974. - (8) A Pocket Guide to Interfacing HP Computers. Cupertino, California: Hewlett Packard Company, 1974. - (9) Volder, J. S. "CORDIC Trigonometric Computation Technique." <u>IRE</u> <u>Transactions on Electronic Computers</u>, EC-8, Sept., 1959, p. 330. - (10) Daggett, D. H. "Decimal-Binary Conversions in CORDIC." TRE Transactions on Electronic Computers, EC-8, Sept., 1959, p. 335. - (11) Meggitt, J. E. "Pseudo-Division and Pseudo-Multiplication Processors." IBM Journal, April, 1962. - (12) Waither, J. S. "A Unified Algorithm for Elementary Functions." Spring Joint Computer Conference, 1971. - (13) Despain, A. M. "Fourier Transform Computers Using CORDIC Iterations." IEEE Transactions on Computers, Vol. c-23, No. 10, Oct., 1974, p. 993. - (14) Cochran, D. S. "Algorithms and Accuracy in the HP 35." Hewlett-Packard Journal, June, 1972. - (15) Schmid, H. and A. Bogacki. "Use of decimal CORDIC for generation of many transcendental functions." <u>Electrical Design News</u>, February, 1973, pp. 64-73. - (16) Richards, R. K. <u>Arithmetic operations in digital computers</u>. New York: Van Nostrand, 1955. - (17) Briggs, H. <u>Logarithmicall arithmetike</u>. London: George Miller, 1631. APPENDIX A FUNCTIONAL BLOCK DIAGRAM Figure D-1. Functional Block Diagram APPENDIX B PROGRAM LISTINGS ``` TASK 1 TEST PROGRAM---CORDIC ALGORITHM IMPLEMENTED IN ASSEMBLY CODE PROGRAM INPUT PARAMETER -- AN ANGLE WHICH MUST BE IN THE RANGE OF (-360,360) DEGREE OUTPUT PARAMETER-- SIN VALUE OF THE INPUT ANGLE de de la companie ASMB, A, B, L, T *SET UP THE CLOCK INTERRUPT VECTOR ADDRESS ORG 148 JSB TIME ORG 2008 STF 0B CLA *SET UP INTERRUPT TIME PERIOD TO 0.1 MILISECOND OTA 14B NOP *SET REPERTITION COUNT TO 100 LDA HD START STA LCT CLA CLB ŜTA \times \cdot \mathbf{q}^{\mathbf{r}} STA COUNT STA FAD ANG *CONVERT THE INPUT ANGLE TO THE CORDIC REPRESENTATION EDV PI RRR 16 SLA JMP AJ STA TPS AND MO SZA RRL 1 E-14 NOF LDA 18 JMP ENT CAX HJ RAR IOR FH STA SM CXB ASR 1 ST ISZ SM JMP ST JMP EN ENT STA RA ENTI LDA RA *INITIATE TIME CLOCK INTERRUPT *THE CORDIC COMPUTING SEQUENCE STARTS HERE STC 148/C CLB STB X STB Y LDB UV CLO 55A JMP CO1 STB Y ADA NRE LDX SIX 84. LBX CON SHT STB RY STB TES LDB Y ``` ``` Ð NOP ISZ TES JMP 01 JMP 82 01 BRS JMP D LDY X B2 SSA, RSS CMB, INB ADY 18 LDB X STY X 01 NOP ISZ RV JMP C2 JMP B3 02 BRS JMP D1 83 LDY Y SSA CMB, INB ADY 18 STY Y LBX A7 SSA, RSS CMB, INB ADA 18 DSX JMP SHT *EXECUTE THE CORDIC COMPUTING SEQUENCE 100 TIMES FOR THE SAME ANGLE ISZ LCT JMP ENT1 CLC 148 *OUTPUT EXECUTION TIME SIN AND COS VALUES OF THE INPUT ANGLE LDB COUNT JSB OUT JSB OUT1 CLB STB COUNT LDB X JSB OUT JSB OUT1 LDB Y JSB OUT1 LDB 0B JSB OUT1 JSB OUT *INCREASE THE INPUT ANGLE BY 0.1 THEN REPEAT THE PROGRAM DLD ANG FAD INC DST ANG *SERVICE ROUTINE FOR CLOCK INTERRUPT JMP START TIME NOP STC 148/C ISZ COUNT JMP TIME, I OCT 0 COUNT CMB, INB co_4 STB Y ADA RE JMP B1 ANG DEC 0.0 FI DEC 3.14159 F=1+ OCT 177600 SN NOP ``` ``` \mathbf{U}\mathbf{V} OCT 23335 NRE OCT 140000 OCT 040000 OCT 16 RE SIX RW BSS 1 TES BSS 1 CON NOP OCT -16 OCT -15 OCT -14 OCT -13 OCT -12 OCT -11 OCT -10 007 -7 007 -6 007 -5 OCT -4 OCT -3 oct -2 OCT -1 NOP NOP 87 OCT 000001 OCT 000002 OCT 000005 OCT 000012 OCT 000024 OCT 000050 OCT 000121 OCT 000242 OCT 000505 OCT 001213 OCT 002421 OCT 004773 OCT 011344 OCT 020000 NOP B55 1 BSS 1 OCT 177654 LOT HD OCT 177654 NOP EH NOP TPS MO OCT 377 DEC 0.1 INC ``` ``` TASK 2 TEST PROGRAM---CORDIC ALGORITHM IMPLEMENTED IN MICROCODE 40 PROGRAM INPUT PARAMETER-- AN ANGLE WHICH MUST BE IN THE RANGE OF 40 (-360,360) DEGREE :4: OUTPUT PARAMETER -- SIN VALUE OF THE INPUT ANGLE ASMB, A. B. L. T. ORG 148 *SET UP THE CLOCK INTERRUPT VECTOR ADDRESS JSB TIME ORG 2008 STF 0B CLA *SET UP INTERRUPT TIME PERIOD TO 0.1 MILISECOND . OTA 148 NOB *SET REPEATITION COUNT TO 100 START LDA HD STA LCT CLA CLB STA çı STA STA COUNT AMG FAD *CONVERT THE INPUT ANGLE TO THE CORDIC REPRESENTATION FDV PI RRR 16 SLA JMP AJ STA TES AND MC SZA REL 1 EN NOP LDA 18 JMP ENT A.J CAX FAR IOR FH STA SM CXA ST ASR 1 ISZ SN JMP ST JMP EN ENT STA RA ENT1 LDA RA *INITIATE TIME CLOCK INTERRUPT *THE CORDIC COMPUTING SEQUENCE STARTS HERE STC 14B, C CLE CBX CBY NOF * THE ENTRY POINT TO THE MICROPROGRAM WHICH PERFORMS THE CORDIC COMPUTING * SEQUENCE OCT 105160 OCT 123335 UNIT VECTOR *THE ANGLE CONSTANTS OCT 140000 MEE ANT1 OCT 020000 OCT 011344 HNT2 ANTS OCT 004773 ``` ``` ANT5 OCT 001213 ANT6 OCT 000505 OCT 000242 ANT7 OCT 000121 ANT10 OCT 000050 ANT11 OCT 000024 ANT12 ANT13 OCT 000012 OCT 000005 ANT14 ANT15 OCT 000002 OCT 000001 ANT16 * RETURN TO THIS POINT FROM MICROPROGRAM ISZ LCT JMP ENT1 CLC 14B *OUTPUT EXECUTION TIME SIN AND COS VALUES OF THE INPUT ANGLE / LDB COUNT JSB OUT JSB OUT1 CLB STB COUNT LDB X JSB OUT JSB OUT1 LDB Y JSB OUT1 LDB 08 JSB -OUT1 JSB OUT *INCREASE THE INPUT ANGLE BY 0.1 THEN REPEAT THE PROGRAM DLD ANG FAD INC DST ANG *SERVICE ROUTINE FOR CLOCK INTERRUPT JMP START TIME NOP STC 148/C ISZ COUNT JMP TIME I OCT 0 COUNT ANG DEC 0.0 DEC 3.14159 FI OCT 177600 FH SN NOP NOP BSS 1 'y' BSS 1 OCT 177654 LOT OCT 127654 HD MOR FA THS MIT OCT 377 111 INC DEC 0.1 ``` ``` * MICROPROGRAM--USED TO IMPLETMENT THE CORDIC ALGORITHM FOR EVALUATING THE SINE FUNTION THE ANGLE OF THE SINE FUNCTION SHOULD BE STORED IN THE REGISTER A BEFORE ENTER THE MICROPROGRAM $SYMTAB #0RIGIN=1400 JMP NOP PASS NOP START #ORIGIN=1411 NOP PASS NOP NOP START NOP *GET THE UNIT VECTOR FROM MAIN MEMORY READ NOP INC PNM P *STORE IT IN REG. X NOP NOP PASS X TAB *GET THE FIRST ANGLE CONTST READ NOP INC FNM *STORE IT IN REG. S7 NOP PASS S7 THE NOP *STORE THE ANGLE OF THE SINE FUNCTION IN REG. 56 NOP NOP PASS 56 Ħ *IF THE ANGLE IS LESS THAN 180 DEGREE BRANCH TO EN1 JMP CNDX AL15 RJS EN1 *GET THE TWO'S COMPLEMENT OF 57 NOP NOP CMPS S7 57 NOF NOP INC 57 57 *GET TWO'S COMPLENT OF X NOR NOP CMPS X × NOP NOP INC \times *STORE X IN Y EN1 NOP NOP PASS Y \times *CLEAR X REG. NOP NOP ZERO X NOP *56=$6+$7 NOR MOR PASS 57 NOF NOP ADD 56 56 *54=-14 IMM NOP LOM 3628 *CLEAR 53 IMM NOR LOM 0B NOP NOF PASS 55 \times NOP NOP PASS L JMP NOP NOP NOP BK1 *INITIALIZE THE COUNTER BK NOP MOP PASS ONTR 53 *RIGHT SHIFT B REG. BY THE NUMBER IN THE COUNTER, *THEN STORE THE SHIFTING RESULT IN S5 NOP RPT PASS 8 ARS R4 PASS B Ε NOP NOP PASS S5 _{\rm B} *SET COUNTER AGAIN PASS CNTR 53 PASS B Y NOP NOR NOP RET ARS. 尺1 PASS B B NOP NOP PASS L В *GET NEXT ANGLE CONSTANT READ NOR BK1 INC PNM NOP NOR PASS S7 THE NOP PASS SE NOP SE *TEST THE ANGLE , IF GREATER THAN 180 DEGREE GO TO EN2 TMP CNDW AL15 NOP EN2 NOP NOP SUB PASS L 55 NOP NOP ADD NOP 57 NOF PASS L NOF NOP SUB 56 56 JMP JM EN2 NOP NOP ADD \times 55 NOP NOP PASS L NOF NOP SUB NOR PASS L 57 NOF NOF NOF ADD 56 JN NOP NOF INC 54 54 JMP CNDX TBZ NOP EXIT NOP NOP 53 53 DEC JMP BK EXIT NOP RTN PASS NOP NOP $FND ``` ``` TASK 3 TEST PROGRAM---POLYNOMIAL METHOD IMPLEMENTED IN ASSEMBLY PROGRAM INPUT PARAMETER -- AN ANGLE RANGED FROM -90 DEGREE TO 90 DEGREE OUTPUT PARAMETER--THE SINE VALUE OF THE INPUT ANGLE <del>********************************</del> ASMB, A, B, L, T *SET UP THE CLOCK INTERRUPT VECTOR ADDRESS ORG 148 JSB TIME ORG 2008 STF 0B CLA OTA 148 NOP LDA HD START STA LCT CLA CLB CAX CAY STA COUNT STC 14B, C LDA ANG MPY ANG ASL 1 STB S0 LDA 1B CLB. MPY C5 ADB C3 LDA 18 CLB MPY SQ ASL 3 ADB C1 LDA 1B CLB MPY ANG STB ANG STB ANS ISZ LOT JMP WENT1 CLC 148 *OUTPUT EXECUTION TIME SIN AND COS VALUES OF THE INPUT ANGLE LDB COUNT JSB OUT JSB OUT1 CLB STB COUNT LDB ANS JSB OUT JSB OUT1 LDB ANG JSB OUT1 *INCREASE THE INPUT ANGLE BY 0.1 THEN REPEAT THE PROGRAM LDA INC ARS, ARS ARS, ARS BRS ADA ANG STA ANG JMP START *SERVICE ROUTINE FOR CLOCK INTERRUPT TIME MOR STC 148/C ISZ COUNT JMP TIME, I 00T 0 COUNT OCT 0 OCT 177654 ANG LCT OCT 177654 HD ANS OCT 0 50 OCT 0 C4. DEC 0.999892 DEC -0.1659685 03 DEC 0.0076031915 05 INC DEC 9.1 ``` ``` TASK 4 TEST PROGRAM--POLYNOMIAL METHOD IMPLEMENTED IN MICROPROGRAM* PROGRAM INPUT PARAMETER--AN ANGLE RANGED FROM -90 DEGREE TO 90 DEGREE 4: OUTPUT PARAMETER -- THE SINE VALUE OF THE INPUT ANGLE ASMB, A, B, L, T *SET UP THE CLOCK INTERRUPT VECTOR ADDRESS ORG 148 JSB TIME ORG 2008 STF 0B CLA OTA 14B NOP START LDA HD STA LCT CLA CLB CAX CAY STA COUNT STC 14B/C *ENTRY POINT OF THE MICROGRAM ENT1 OCT 105160 OCT 0 ANG C1 OCT 77774 03 OCT 125406 05 OCT 76222 ANS OCT 0 *THE MICROPROGRAM RETURNS THE CONTROL TO THIS POINT ISZ LCT JMP ENT1 CLC 148 *OUTPUT EXECUTION TIME, SIN AND COS VALUES OF THE INPUT ANGLE LDB COUNT JSB OUT JSB OUT1 CLB STB COUNT LDB ANS JSB OUT JSB OUT1 LDB ANG JSB OUT1 *INCREASE THE INPUT ANGLE BY 0.1 THEN REPEAT THE PROGRAM LDA INC ARS, ARS ARS, ARS ARS. ADA ANG STA ANG JMP START *SERVICE ROUTINE FOR CLOCK INTERMUPT TIME STC 148, C ISZ COUNT JMP TIME, I OCT 0 COUNT LOT OCT 177654 HD OCT 177654 S0 OCT 0 INC DEC 0.1 END ``` ``` * MICROPROGRAM--USED TO EVALUATE SIN FUNCTION BY IMPLEMENTING THE POLYNOMIAL METHOD $SYMTAB. #ORIGIN=1400 READ NOR INC PAM P *STORE THE VALUE X IN S2 AND S9 NOP NOP PASS 52 THE NOP NOP PASS S9 52 *STORE THE VALUE C1 IN S1 READ NOR INC PMM NOP NOP TAB PASS S1 *STORE THE VALUE C3 IN S3 READ NOR INC FNM NOP NOP PASS 53 TAB *STORE THE VALUE C5 IN S5 F READ NOP INC PNM MOF NOP PASS 55 TAB *COMPUTE X*X NOF PASS A MOR 52 JSB NOP NOP NOP MPY ARS PASS B \mathbb{B} *STORE X*X IN 56 NOF NOF PASS 56 В *COMPUTE C5*X*X NOF NOP PASS A \mathbf{B} MOP MOP PASS 52 55 JSB NOP NOP NOP MESS MOP NOP PASS L 53 *COMPUTE 03+05*(X*X) NOP NOP ADD A B *COMPUTE (X*X)*(C3+(C5(X*X))) NOP NOP PASS 52 NOP NOP JSB MOR MER *ADJUST THE SCALE FACTOR ARS. 1.1 FASS B _{\rm B} ARS PASS B E 1.1 BRS. PASS B \mathbf{E} 1.1 *COMPUTE C1+(X*X)*(C3+C5*(X*X)) MOP MOR PASS L 51 NOF NOF ADD A (3) *COMPUTE X(C1+X*X*(C3+C5*(X*X))) NOF PASS S2 \leq 9 NOF JSB NOF NOP NOP MES *SAVE THE RESULT IN MAIN MEMORY NOP NOP PASS T \mathbf{E} WRITE MOR INC PMM *RETURN TO MACROPROGRAM NOP RTN PASS NOP NOP RETURN *SUBROUTINE FOR COMPUTING THE MULTIPLICATION OF TWO INTEGERS MEY NOP COM PBSS 57 F NOP NOP ZERO 8 NOF NOF NOF PASS L 52 RET NOF PASS ONTR B MEY E1 ADD 8 В NOF NOF PASS NOP 57 JMF CNDW AL15 RJS *** NOF NOF SUB B NOF NOF PASS NOP 52 RETURN JMP CNDM AL15 RJS NOF NOF PASS L 57 NOF RTN SUB B B #END ``` ``` * PRINTING ROUTINE--USED TO PRINT THE CONTENT OF B-REGISTER NOP DST. SAV STX SAV1 LDA TTY OTA 118 LDA BL OTA 118 STC 118/C SFS 118 JMF *-1 LDA SIXT \mathbb{C} \mathsf{B} \times CLA RRL 1 LOP ADA ASC 0TA 11B STC 418/C CLB SFS 11B JMF *-1 15\% JMP LOP DLD SAV LDX SAV1 JMP OŪT1, I END ``` ``` * CARRIAGE CONTROL ROUTINE--RETURN THE CARRIAGE TO THE BEGINING OF* THE LINE AND FEED ONE LINE OUT NOP DST SAV CLC 0/C LDA TTY OTA 118 LDA CR OTA 11B STC 118/C SFS 11B JMP *-1 LDA LF OTA: 11B STC 118/C SFS 11B JMP *-1 DLD SAV JMP OUT/ I ASC OCT 60 TTY OCT 120000 SINT OCT 177760 CR. OCT 215 OCT 12 LF SAV NOP NOF SAV1 MOP BL OCT 240 ``` END #### ス VITA # Peihsung Thomas Hu # Candidate for the Degree of #### Master of Science Thesis: THE CORDIC ALGORITHM IMPLEMENTATION FOR TRIGONOMETRIC FUNCTION EVALUATION IN HP21MX Major Field: Computing and Information Sciences # Biographical: Personal Data: Born in Taipei, Taiwan, Republic of China, July 2, 1950, the son of Mr. and Mrs. B. Y. Hu. Education: Graduated from Chengko High School, Taipei, Taiwan, Republic of China, in June, 1968; received Bachelor of Science degree in Electrical Engineering from Chiao Tung University, HsinoHu, Taiwan, Republic of China, in June, 1972; completed requirements for the Master of Science degree at Oklahoma State University in May, 1978. Professional Experience: Graduate teaching assistant, Department of Computing and Information Sciences, Oklahoma State University, 1975-1976; Software specialist, Atkins & Merril Training Equipment Company, 1976-present.