Error detection in the 21164 is concentrated in the Bus Interface Unit (BIU). The Alpha CPU uses ECC to ensure data integrity. The following errors are detected in the CPU:
| Icache Tag or Data Parity Error: The Icache is parity protected.
A machine check occurs before the instruction, thereby causing execution of the parity error. ICPERR_STAT: DPE (data parity error) or TPE (tag parity error) is set. EXC_ADDR contains either the PC of the instruction that is causing the parity error or the PC of an earlier trapping instruction. In this event, the Icache is not flushed by hardware. |
| Scache Data Parity Error, Istream: A machine check occurs before execution of
the instruction responsible for the parity error. Bad data can be written
to the Icache or Icache refill buffer and validated. The operation can
be retired if there are not multiple errors. SC_STAT: SC_DPERR<7:0> is set; <SC_SCND_ERR> is set if there are multiple errors. SC_STAT: CBOX_CMD is IREAD. SC_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) |
| Scache Tag Parity Error, Istream: A machine check occurs before execution of
the instruction responsible for the parity error. Bad data can be written
to the Icache or Icache refill buffer and validated. The operation cannot
be retired. SC_STAT: SC_TPERR<2:0> is set; <SC_SCND_ERR> is set if there are multiple errors. SC_STAT: CBOX_CMD is IREAD. SC_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) |
| Scache Data Parity Error, Dstream: A machine check occurs.
The machine state may have changed. You cannot retry this, but deleting the process
may be sufficient if the data was confined to a single process and no second error occurred. SC_STAT: SC_DPERR<7:0> is set; <SC_SCND_ERR> is set if there are multiple errors. SC_STAT: CBOX_CMD is DREAD, DWRITE, or READ_DIRTY. SC_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) |
| Scache Tag Parity Error, Dstream: A machine check occurs. The machine
state may have changed. You cannot retry this. Most likely, you will not be able to recover
by deleting a single process because the exact address is unknown. SC_STAT: SC_TPERR<2:0> is set; < SC_ADDR: Records physical address bits <39:04> of the location with the error. |
| Dcache Data Parity Error: The Dcache data is parity protected. A machine check occurs.
The machine state may have changed. You cannot retry this, but you may only need to delete the process if data is confined to a single process and no second
error occurred. DCPERR_STAT: <DP0> or <DP1> (data parity error in bank 0 or 1) is set. <LOCK> is set. <SEO> is set if there are multiple errors. VA: Contains the virtual address of the quadword with the error. MM_STAT: Locked. Contents contain information about instruction causing the error. |
| Dcache Tag Parity Error: The Dcache Tag is parity protected. A machine check occurs. The machine state may have changed. DCPERR_STAT: <TP0> or <TP1> (tag parity error in bank 0 or 1 is set. <LOCK> is set. <SEO> is set if there are multiple errors. VA: Contains the virtual address of the Dcache block (hexword) with the error. MM_STAT: Locked. Contents contain information about instruction causing the error. The <WR> bit is set if the error occurred on a store instruction. |
| Istream Uncorrectable ECC Error: A machine check occurs before execution of the instruction responsible for the parity error. Bad data may be written to the Icache or Icache refill buffer and validated. You can retry the operation if there are not multiple errors. The Icache must be flushed to remove bad data. You can flush the Icache refill buffer by executing enough instructions to fill the refill buffer with new data (32 instructions). Then reflush the Icache. EI_STAT: <UNC_ECC_ERR > is set; <SEO_HRD_ERR > is set if there are multiple errors. EI_STAT: <EI_ES > is set if source of fill data is memory/system, clear if Bcache. EI_STAT: <FIL_IRD > is set. EI_ADDR: Contains the physical address bits <39:4 > of the octaword associated with the error. FILL_SYN: Contains the syndrome bits associated with the failing octaword. BC_TAG_ADDR: Holds the result of external cache tag probe if external cache was enabled for this transaction. |
| Dstream Uncorrectable ECC Error: A machine check occurs. The machine state
may have changed. You cannot retry the operation, but you may only need to delete the
process if the data is confined to a single process and no second error occurred. EI_STAT: <UNC_ECC_ERR> is set; <SEO_HRD_ERR> is set if there are multiple errors. EI_STAT: <EI_ES> is set if source of fill data is memory/system, clear if Bcache. EI_STAT: <FIL_IRD> is clear. EI_ADDR: Contains the physical address bits <39:4> of the octaword associated with the error. FILL_SYN: Contains the syndrome bits associated with the failing octaword. BC_TAG_ADDR: Holds the result of external cache tag probe if external cache was enabled for this transaction. |
| Bcache Tag Parity Error -- Istream: A machine check occurs before execution of the instruction responsible for the parity error. Bad data may be written to the Icache
or Icache refill buffer and validated. You can retry the operation if there are not multiple errors.
The Icache must be flushed to remove bad data. You can flush the Icache refill buffer
by executing enough instructions to fill the refill buffer with new data (32 instructions).
Then reflush the Icache. EI_STAT: <BC_TPERR> or <BC_TC_PERR>is set; <SEO_HRD_ERR> is set if there are multiple errors. EI_STAT: <EI_ES> is clear. EI_STAT: <FIL_IRD> is set. EI_ADDR: Contains the physical address bits <39:4> of the octaword associated with the error. BC_TAG_ADDR: Holds the result of external cache tag probe. |
| Bcache Tag Parity Error -- Dstream: A machine check occurs. The machine
state may have changed. You cannot retry the operation, but you may need to delete only the
process if the data is confined to a single process and no second error occurred.
EI_STAT: <BC_TPERR> or <BC_TC_PERR> is set; <SEO_HRD_ERR> is set if there are multiple errors. EI_STAT: <EI_ES> is clear. EI_STAT: <FIL_IRD> is clear. EI_ADDR: Contains the physical address bits <39:4> of the octaword associated with the error. BC_TAG_ADDR: Holds the result of external cache tag probe. |
| System Command/Address Parity Error: A machine check occurs, and the
machine state may have changed. EI_STAT: <EI_PAR_ERR> is set; <SEO_HRD_ERR > is set if there are multiple errors. EI_STAT: <EI_ES > is set. EI_ADDR: contains the physical address bits <39:4> of the octaword associated with the error. BC_TAG_ADDR: Holds results of external cache tag probe if external cache was enabled for this transaction. When the 21164 detects a command or address parity error, the command is unconditionally NOACKed. |
| Istream or Dstream Correctable ECC Errors: The 21164 hardware corrects the data before filling the Scache and Icache. The Dcache is completely invalidated. The data in
the Bcache contains the ECC error but is scrubbed by PALcode in the correctable interrupt routine.
A separately maskable correctable error interrupt occurs at IPL 31 (same as machine check)
(masked by clearing ICSR<CRDE>). ISR: <CRD> is set. EI_STAT: <COR_ECC_ERR> is set. EI_STAT: <FIL_IRD> is set if Istream or clear if Dstream. EI_STAT: <EI_ES> is clear if the source of the error is Bcache, and set otherwise. EI_ADDR: Contains the physical address bits <39:4> of the octaword associated with the error. FILL_SYN: Contains the syndrome bits associated with the octaword containing the ECC error. BC_TAG_ADDR: Unpredictable (not loaded on correctable errors). |
The Bcache does not detect errors, but it is protected; the data is protected by ECC and the tag by parity. ECC is generated by the CPU for each group of 8 bytes written into the Bcache. Fill data from the Bcache to the system is not checked for errors. During a fill from the Bcache, if a correctable error is detected (single-bit error), the CPU traps and the fill is replayed with corrected data.
The SROM and the SROM interface to the 21164 do not have error checking capability.
The memory SIMMs do not detect errors, but they furnish information though ECC bits to error detection networks in the CPU and core logic. During CPU-initiated transactions, ECC typically is generated by the CPU (in the case of a write Bcache victim to memory, the ECC is from the Bcache). For DMA write transactions, ECC is generated in the PCI portion of the core logic. If there is bad parity on data being written from the PCI to memory during a DMA write, the PCI agent that instigated the DMA write is allowed to complete normally, but the write data is discarded (PYXIS_ERR<PCI_PERR> will be set).
Not all PCI devices are required to detect and report parity errors, but they must generate parity on all of their transactions. Some do this more successfully than others. During the address phase, the PAR bit provides even parity for AD [31:0] and C/BE[3:0] regardless of whether all the lines carry meaningful information.
Master devices drive PAR for the address and data phases on write transactions. Target devices drive PAR during the data phase of read transactions.
The PCI contains PERR and SERR to signal errors. PERR reports data parity errors for all transactions except special cycle commands. PERR can only be driven by one device at a time. Targets signal data parity errors back to the master using PERR.
SERR reports address parity errors and data parity errors on special cycles. It is a wire-OR'd signal that can be driven by multiple devices at any one time. SERR will be sent to the CPU as an NMI.
The ISA bus uses I/O Channel Check (IOCHK) to signal that some ISA device detected a parity error on the ISA bus. The assertion of IOCHK will cause an NMI to be sent to the main interrupt controller, which in turn sends a machine check interrupt request to the CPU.