计算机组成与体系结构2011A卷

1. Which one of these will cause overflow in signed addition?

A. If there is a carry out of the least significant bit.
B. If there is a carry out of the most significant bit.
C. If adding two negative numbers results in a positive result.
D. If the magnitude of the result is smaller than the magnitude of the smaller added.

答案: C. 如果两个负数相加结果为正数。

解释：
在有符号加法中，溢出发生的情况是结果超出了数据类型能表示的范围：

如果两个正数相加结果变成负数，或者两个负数相加结果变成正数，都会导致溢出。
选项 C 直接描述了溢出发生的情况，因此是正确的。

2. A major advantage of direct mapping of cache is its simplicity. The main disadvantage of this organization is that .

A. It does not allow simultaneous access to the intended data and its tag.
B. It is more expensive than other types of cache organizations.
C. The cache hit ratio is degraded if two or more blocks used alternately map onto the same block frame in the cache.
D. Its access time is greater than that of other cache organizations.

答案: C. 缓存命中率会因为两个或多个块交替映射到缓存中的同一块框而降低。

解释：
直接映射缓存的主要问题是 冲突问题：多个内存块可能会映射到同一个缓存块。如果它们被频繁交替访问，会导致缓存命中率下降。
选项 C 准确描述了这种情况，因此是正确的。

3. In an interrupt process, the usage of saving PC is .

A. to make CPU find the entry address of the interrupt service routine
B. to continue from the program breakpoint when returning from interrupt
C. to make CPU and peripherals working in parallel
D. to enable interrupt nesting

答案: B. 在返回中断时，从程序断点继续执行。

解释：
中断发生时，当前程序的执行位置（即程序计数器 PC）会被保存，以便在中断服务程序执行完成后，能够恢复到被中断程序的断点位置继续执行。
选项 B 正确描述了保存 PC 的主要目的。

4. What is the effect of the following instruction?

move ecx, [ebp+8]

A. Add 8 to the contents of ebp and store the sum in ecx.
B. Add 8 to the contents of ebp, treat the sum as a memory address and store the contents at that address in ecx.
C. Add 8 to the contents of the memory location whose address is stored in ebp and store the sum in ecx.
D. Add the contents of ebp to the contents of memory address 8 and store the sum in ecx.

答案: B. 将 ebp+8 视为内存地址，取出该地址的内容存入 ecx。

解释：

move ecx, [ebp+8] 的含义是：
1. 计算 ebp + 8，得到一个内存地址。
2. 从该内存地址中读取内容，并将其存储到寄存器 ecx 中。
  方括号 [] 表示 “内容取值”，即将内存地址指向的数据读取出来。
  选项 B 正确描述了这个操作的效果。

5. The part of machine level instruction, which tells the central processor what was to be done is .

A. operation code
B. address
C. operand
D. none of the above

答案: A. operation code

解释：
在机器级指令中，操作码（operation code，简称 opcode） 指示处理器要执行什么操作，比如加法、减法、加载等。

地址：指令中指定数据或操作数的位置。
操作数：指令中实际的数据或内容。
因此，操作码是告诉处理器“要做什么”的部分，所以选 A。

6. In a cache memory system, for a write operation, if the cache location and the main memory location are updated simultaneously, then it uses technique.

A. write-back
B. write-out
C. write-allocation
D. write-through

答案: D. write-through

解释：
在 write-through 技术中，每当缓存中的数据被写入时，主存中的对应位置也会被立即更新。因此缓存和主存始终保持同步。

write-back：数据只写入缓存，只有在数据被替换或驱逐时才写回主存。
write-out：这是非标准术语，不适用于本题。
write-allocation：指当发生写缺失时，先将数据加载到缓存中，然后再进行写操作。

write-through 是数据同时更新缓存和主存的技术，因此选 D。

7. is the process by which the next device to become the bus master is selected and bus mastership is transferred to it.

A. Bus phase
B. Bus arbitration
C. Bus timing
D. Bus transceiver

答案: B. Bus arbitration

解释：
总线仲裁（Bus arbitration） 是指在总线上选择下一个总线主控设备的过程。当多个设备请求访问总线时，总线仲裁机制决定哪个设备将获得总线主控权（bus mastership）。

Bus phase：非标准术语。
Bus timing：描述总线操作的时间特性。
Bus transceiver：用于信号传输的设备，和总线主控权无关。

因此，选 B。

8. Which one of the following about benefits of virtual memory is not true?

A. provide large address space
B. relieve programmers from burden of overlays
C. resolve internal fragmentation
D. simplify relocation

答案: C. resolve internal fragmentation

解释：
虚拟内存的主要优势包括：

提供大地址空间（A）：虚拟内存允许程序使用比物理内存更大的地址空间。
减轻程序员负担（B）：程序员不需要手动进行覆盖管理（overlays），因为虚拟内存自动处理内存分页。
简化重定位（D）：程序可以被加载到任意物理内存地址，简化了重定位。

然而，虚拟内存不能解决内部碎片化（C）。

内部碎片化 是指分配的内存块比实际使用的空间大导致的浪费，主要发生在固定分区内存管理中，而虚拟内存通过分页解决的是 外部碎片化。

因此，选 C。

9. A hard disk with 5 platters has 2048 tracks/platter, 1024 sectors/track (fixed number of sectors per track), and 512 byte sectors. What is its total capacity?

A. 5GB
B. 10GB
C. 15GB
D. 20GB

答案: A. 5GB

解释：
总容量计算公式为：
$ = $

磁盘有 5 个盘片，每个盘片有 2 个面，所以总共有 $ 5 = 10 $ 个磁盘面。
每个磁盘面有 2048 个磁道。
每个磁道有 1024 个扇区。
每个扇区有 512 字节。

计算：
$ 5 = 10,737,418,240 = 5 $

因此，答案是 B. 5GB。

10. Translate the IEEE single-precision floating point numbers shown below to their decimal equivalent.

A. +832
B. -832
C. +416
D. -416

对于IEEE 754单精度浮点数格式，转换为十进制数的公式如下：

$ V = (-1)^s (1 + M) ^{E - 127} $

其中：

$s$是符号位（0表示正数，1表示负数）
$M$是尾数（规格化形式下，尾数的第一位是隐含的1，加上实际的尾数位）
$E$是指数位（无符号整数，实际指数需要减去127的偏移量）

对于给定的例子（10000111，尾数10100000000000000000000）：
- 符号位$s = 1$
- 指数位$E = 10000111_2 = 135_{10}$
- 尾数位$M = 1.101_2$（隐含的1加上实际的尾数位$0.101$）
代入公式：
- $V = (-1)^1 \times (1 + 0.101_2) \times 2^{135 - 127}$
- 先计算$1 + 0.101_2 = 1.101_2 = \frac{13}{8}$
- $E - 127 = 135 - 127 = 8$
- $V = -1 \times \frac{13}{8} \times 2^8$
- $2^8 = 256$
- $V = -1 \times \frac{13}{8} \times 256 = -416$

11. In microprogram-controlled machines, the relationship between the machine instruction and the microinstruction is .

A. a machine instruction is executed by a microinstruction
B. a microinstruction is composed of several machine instructions
C. a machine instruction is interpreted by a microroutine composed of microinstructions
D. a microroutine is executed by a machine instruction

答案: C. a machine instruction is interpreted by a microroutine composed of microinstructions

解释：
在微程序控制的计算机中，机器指令并不直接执行，而是由一个 微程序（microroutine） 解释和执行。

微程序由一系列 微指令（microinstructions） 组成。
每个机器指令对应一个微程序，而微程序由多个微指令组成，用来实现机器指令的功能。

因此，选项 C 是正确的。

12. In the following statements, is not true.

A. Branch instructions can cause delays in pipelined processors, because the processor cannot determine which instruction to fetch next until the branch has executed.
B. Structural hazards occur when the processor’s hardware is not capable of executing all the instructions in the pipeline simultaneously.
C. Pipelining increases processor performance by decreasing the execution time of an instruction.
D. Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the unpipelined machine.

答案: C. Pipelining increases processor performance by decreasing the execution time of an instruction.

解释：

流水线（Pipelining） 并不会减少单条指令的执行时间，它只是在多条指令的执行过程中通过 重叠执行 提高整体吞吐量。
选项 C 错误，因为流水线通过并行操作增加了性能，但单条指令的执行时间并没有缩短。

其他选项分析：

A. 分支指令导致延迟：正确，分支指令的目标不确定，可能会引起流水线停顿。
B. 结构性冲突：正确，当硬件资源不足时，无法同时执行多条指令，导致结构性冲突。
D. 数据冒险：正确，数据冒险是指流水线改变了读/写顺序，导致数据依赖问题。

因此，答案是 C。

13. Interrupts generated by the keyboard will interrupt the CPU .

A. only when the CPU is executing a busy-loop.
B. only when the CPU is not doing any useful work.
C. after the CPU has turned on the interrupt-enable flag.
D. when the CPU is executing time-critical work.

答案: C. after the CPU has turned on the interrupt-enable flag.

解释：
键盘中断是 硬件中断，只有当 CPU 开启中断使能标志（interrupt-enable flag） 时，中断才会触发 CPU 的中断处理机制。

A：无论 CPU 是否在执行忙等待，硬件中断都可以触发。
B：CPU 是否在执行有用工作与中断触发无关。
D：即使 CPU 正在执行关键任务，中断也可能发生，但需要中断使能标志。

因此，选项 C 正确。

14. In a computer with a microprogrammable control unit, the period of the clock is determined by .

A. the delay of the control memory.
B. the delay of the main memory.
C. the delay of the ALU.
D. the sum of two of the above delays.

答案: D.the sum of two of the above delays.

15. If the 2010 version of a computer executes a program in 200s and the version of the computer made in the year 2011 executes the same program in 120s, then the speedup that the manufacturer has achieved over the two-year period is .

A. 1.44
B. 1.78
C. 1.53
D. 1.67

答案: D. 1.78

解释：
加速比（Speedup） 的公式为：
$ = $
代入题中数据：
$ = = 1.6667 $

因此，答案是 D. 1.67。

简答题

问题

What is the difference between the DMA and interrupt-driven methods?
从两个方面分析：

CPU 何时响应 DMA 请求或中断请求？
CPU 在确认 DMA 请求或中断请求时需要做哪些工作？

答案与解释

(1) CPU 响应 DMA 请求和中断请求的时间

DMA 方法（Direct Memory Access）：
CPU 在 DMA 控制器发出 DMA 请求 时，会立即响应（前提是当前系统允许 DMA 请求）。CPU 将总线控制权交给 DMA 控制器，然后可以继续执行其他任务。
- DMA 的响应发生在数据传输开始之前，CPU 只需要最小的介入。
中断驱动方法：
CPU 在外设发出 中断请求 后，通常在完成当前指令周期或关键任务时，进入中断处理程序（Interrupt Service Routine, ISR）。
- CPU 对中断请求的响应时间通常由中断使能标志和当前任务的性质决定。

总结：

DMA 请求：CPU 几乎立即响应，并将数据传输工作交给 DMA 控制器。
中断请求：CPU 可能稍有延迟，通常在完成当前工作后才进入中断处理程序。

(2) CPU 在确认 DMA 请求和中断请求时的工作

DMA 方法：
1. 确认 DMA 请求后，CPU 将总线控制权交给 DMA 控制器。
2. DMA 控制器独立完成数据的传输（内存与 I/O 设备之间）。
3. CPU 不需要处理数据传输过程，直到传输完成时 DMA 控制器再向 CPU 发出完成中断。
中断驱动方法：
1. CPU 确认中断请求后，暂停当前程序，保存上下文（如程序计数器 PC、寄存器等）。
2. CPU 执行 中断服务程序（ISR），处理中断事件，如数据传输或设备控制。
3. 中断服务完成后，CPU 恢复上下文 并返回之前的程序继续执行。

总结：

DMA 请求：CPU 只负责启动和结束阶段，数据传输由 DMA 控制器独立完成。
中断请求：CPU 负责全部工作，包括暂停程序、处理中断、恢复程序等。

问题

1. A set-associative cache consists of a total of 32 blocks divided into 4-block sets. The main memory contains 1024 blocks, each consisting of 64 words.
(1) How many bits are there in a main memory address?
(2) How many bits are there in each of the TAG, SET, and WORD fields?

解答与解释

(1) 计算主存地址的位数

已知信息：

主存有 1024 块（blocks），每块有 64 个字（words）。
每个主存地址对应一个字。

总字数（总地址数）：
$ = = 1024 = 65536 $

由于 65536 = $ 2^{16} $，因此需要 16 位 来表示主存中的所有地址。

答案：主存地址总位数为 16 位。

(2) 计算 TAG、SET 和 WORD 字段的位数

主存地址被分为 TAG 字段、SET 字段 和 WORD 字段，具体如下：

WORD 字段： 表示每个块中的字的偏移。
SET 字段： 表示地址对应的 缓存集合（set）。
TAG 字段： 用于标识主存块是否在缓存中。

Step 1：计算 WORD 字段位数

每个块有 64 个字，所以需要 $ _2(64) $ 位来表示块内的字偏移。

$ = _2(64) = 6 $

Step 2：计算 SET 字段位数

缓存总共有 32 块，且每个 集合（set） 有 4 块，因此缓存中有：

$ = = = 8 $

需要 $ _2(8) $ 位来表示缓存集合的索引：

$ = _2(8) = 3 $

Step 3：计算 TAG 字段位数

主存地址总位数为 16 位，去掉 WORD 字段 和 SET 字段 的位数，剩下的就是 TAG 字段的位数：

$ = - - $
$ = 16 - 3 - 6 = 7 $

最终答案：

主存地址的总位数：16 位
各字段的位数：
- TAG 字段：7 位
- SET 字段：3 位
- WORD 字段：6 位

总结：

字段	位数	解释
WORD	6	表示块内 64 个字的偏移。
SET	3	表示缓存中 8 个集合的索引。
TAG	7	用于标识主存块是否在缓存中。
总地址	16	总地址需要 16 位，包含 TAG、SET 和 WORD 字段。

问题

1. The two unsigned binary numbers shown below are to be multiplied using a multiplier that uses Booth’s Algorithm:

1 2	1 0 0 1 1 0 1 0 X 0 1 1 1 0 1 1 1

How many bits will be needed to store the product of these two numbers?
How many additions and subtractions will be performed by the Booth’s Algorithm multiplier respectively?

解答与解释

(1) 计算存储乘积所需的位数

两个无符号二进制数分别有：

第一个数：8 位（$ 1 0 0 1 1 0 1 0 $）
第二个数：8 位（$ 0 1 1 1 0 1 1 1 $）

在二进制乘法中，乘积所需的位数是 两个数的位数之和：

$ = 8 + 8 = 16 $

答案：存储乘积需要 16 位。

(2) 使用 Booth 算法进行乘法，计算加法和减法的次数

Booth 算法的原理：
Booth 算法通过扫描乘数的位序列，利用连续的 0 和 1 来减少加法和减法次数，具体规则如下：

00 或 11：执行 0 操作（无需加减）。
01：执行加法（加上被乘数）。
10：执行减法（减去被乘数）。

步骤：

观察乘数 0 1 1 1 0 1 1 1，在 Booth 算法中，每次考虑 乘数的两位（当前位和前一位，包括符号扩展位）：
- 假设前一位初始为 0。

位对（前一位, 当前位）	操作	解释
0 0	无操作	无需加或减
0 1	加法	加被乘数
1 1	无操作	无需加或减
1 0	减法	减被乘数

分析乘数：
乘数为 $ 0 1 1 1 0 1 1 1 $，从右到左考虑（包括符号位扩展的 0）：

前一位	当前位	操作
0	1	加法
1	1	无操作
1	1	无操作
1	0	减法
0	1	加法
1	1	无操作
1	1	无操作
1	0	减法

统计加法和减法的次数：

加法次数：2 次
减法次数：2 次

答案：

加法次数：2 次
减法次数：2 次

最终答案总结

存储乘积所需的位数：16 位
加法次数：2 次，减法次数：2 次

问题

1. What are the advantages and disadvantages of hardwired and microprogrammed control?

答案与解释

计算机控制单元有两种主要实现方式：硬布线控制（Hardwired Control） 和 微程序控制（Microprogrammed Control）。

硬布线控制（Hardwired Control）：

硬布线控制是通过组合逻辑电路直接实现的控制单元，使用硬件实现指令的控制信号生成。

优点：

速度快：由于控制信号是由硬件逻辑直接生成的，没有额外的存储和解码步骤，因此执行速度更快。
执行效率高：适合设计简单、固定的指令集架构（例如 RISC）。
响应时间短：控制信号的生成时间较短，能快速响应指令的执行。

缺点：

设计复杂：硬布线需要复杂的逻辑电路设计，尤其是对于复杂的指令集（CISC）而言。
不易修改：一旦设计完成，修改指令集或添加新指令非常困难，需要重新设计硬件。
成本较高：硬件资源消耗较多，成本较高。

微程序控制（Microprogrammed Control）：

微程序控制通过存储在 控制存储器（Control Memory） 中的微指令来生成控制信号。

优点：

灵活性高：微程序存储在存储器中，可以轻松修改或扩展指令集，只需更改微程序即可。
易于实现复杂指令：适合复杂的指令集（CISC），可以分解复杂指令为多个微指令逐步执行。
设计简单：相比硬布线，微程序控制更易于设计和实现。

缺点：

速度较慢：微程序控制需要访问存储器，增加了解码和存储访问时间，速度不如硬布线控制快。
存储器开销大：需要额外的存储空间来存放微程序，增加了系统的存储开销。
执行效率较低：每个指令需要多条微指令来完成，增加了执行的周期数。

总结比较

特性	硬布线控制（Hardwired Control）	微程序控制（Microprogrammed Control）
速度	快	慢
灵活性	低，不易修改	高，易于修改和扩展
设计复杂度	高，逻辑电路复杂	低，设计相对简单
适合的指令集类型	简单指令集（RISC）	复杂指令集（CISC）
成本	高，硬件资源消耗多	较低，主要是存储器开销

结论：

硬布线控制 更适合需要 高速执行 和 简单指令集 的系统。
微程序控制 更适合需要 灵活性 和 复杂指令集 的系统，特别是在易于修改和扩展指令的应用场景中。

问题

1. Assume that a computer’s instruction length is 16-bit, and its operand address is 6-bit. Suppose the designers need two-address instructions, one-address instructions, and zero-address instructions. How should we design the instruction format? And specify the numbers of each type of instruction that can be designed.

答案如下:

问题

考虑浮点数采用12位格式表示。比例因子的隐含基数为2，有一个5位、偏移量为15的指数，其中0和31这两个端点值分别用于表示精确的0和无穷大。6位尾数按照IEEE格式进行归一化，在二进制小数点左边有一个隐含的1。
1. 在此格式下表示数字 -0.6875和 +19。
2. 对操作数A = 0 10001 011011、B = 1 01111 101010执行减法运算（注意：答案中使用舍入作为截断方法。写出计算过程！）

答案

(1)

表示 -0.6875：
- 首先将 -0.6875转换为二进制小数，$-0.6875 = -0.1011$（二进制）。
- 进行归一化处理可得$-1.011×2^{-1}$。
- 对于指数，偏移量为15，$-1$对应的指数值为$15 - 1 = 14$（十进制），转换为二进制是 $1110$。
- 尾数部分去掉前面隐含的1后为 $011000$（补齐6位）。
- 符号位因为是负数为1，所以 -0.6875的12位表示为：$1 01110 011000$。
表示 +19：
- $19$转换为二进制是 $10011$（二进制），写成规格化形式为$1.0011×2^{4}$。
- 指数部分，偏移量为15时，$4$对应的指数值为 $15 + 4 = 19$（十进制），二进制表示为 $10011$。
- 尾数部分去掉前面隐含的1后为 $001100$（补齐6位）。
- 符号位因为是正数为0，所以 +19的12位表示为：$0 10011 001100$。

(2)

找到了课本原题，这里贴上。

问题

1. Consider the following piece of code:

int i;
int A[1024*1024];
int x=0;
for (i=0; i<1024; i++) {
    x += A[i] + A[1024*i];
}

系统参数：
- 2-way set associative 16KB data cache，32-byte blocks
- 32-bit words
- LRU replacement policy
- int is word-sized (4 bytes)
- The address of A is 0x0, i and x are in registers
- 缓存初始为空

问题：

数据缓存的总缺失次数（misses）是多少？
命中次数（hits）是多少？

解答与解释

1. 缓存配置参数分析

缓存大小：16 KB = 16 × 1024 = 16,384 字节
块大小：32 字节（每块存储 8 个 int 类型的元素，每个 int 占 4 字节）
缓存方式：2-way set associative
总块数：16,384 字节 ÷ 32 字节 = 512 块
组数：512 块 ÷ 2-way = 256 组

每组包含 2 个块，因此有 256 个组，每组可以存储 2 个缓存块。

2. 数组 A 的内存访问模式

数组 A 有 $ 1024 = 1,048,576 $ 个元素，每个元素是 4 字节（32 位）。因此，数组 A 总共占用 $ 1,048,576 = 4 $。

在这段代码中，循环体中的访问模式是：

A[i]：顺序访问数组 A 的前 1024 个元素。
A[1024 * i]：每次访问 A 中第 $ 1024 i $ 个元素，即每隔 1024 个元素访问一次。

3. 内存访问分析

访问 A[i]：
- 访问顺序为：A[0], A[1], A[2], ..., A[1023]。
- 每个访问是连续的，因为每个 int 大小是 4 字节，块大小为 32 字节，因此每个缓存块存储 8 个连续的 int 元素。
- 总共需要访问 1024 个元素，而每 8 个元素（即每次访问 8 个连续的 int）会加载一个新的缓存块。
- 结果：1024 ÷ 8 = 128 次缓存块访问，因此会发生 128 次缓存缺失。
访问 A[1024 * i]：
- 访问的顺序是：A[0], A[1024], A[2048], ..., A[1024 ]。
- 由于每次访问的元素相隔 1024 个元素（4 KB），因此每次访问一个新的缓存块，且这些块与 A[i] 访问的缓存块没有重叠。
- 结果：1024 次访问，每次访问一个新的缓存块，因此会发生 1024 次缓存缺失。

4. 总结缓存缺失和命中次数

总访问次数：每个循环体中有 2 次访问（一次访问 A[i]，一次访问 A[1024 * i]），循环执行 1024 次，因此总访问次数为： $ 1024 = 2048 $
缓存缺失次数：
- 对于 A[i] 的访问：128 次缓存缺失。
- 对于 A[1024 * i] 的访问：1024 次缓存缺失。
- 总共：$ 128 + 1024 = 1152 $
缓存命中次数： $ - = 2048 - 1152 = 896 $