2024-02-17 Multics Hang at 'CPU Add' During Boot, Part 2

unused0

Charles Anthony

Posted on February 17, 2024

2024-02-17 Multics Hang at 'CPU Add' During Boot, Part 2

A user reported a repeatable hang on a Raspberry Pi; I fired up my Pi 4 Model B and ran the latest simulator (15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec) with Quickstart 12.8, modified the Multics config, adding:

cpu -tag b -port 6 -state on -type dps8 -model 70. -cache 8. 
cpu -tag c -port 5 -state on -type dps8 -model 70. -cache 8. 
cpu -tag d -port 5 -state on -type dps8 -model 70 -cache 8. 
cpu -tag e -port 4 -state on -type dps8 -model 70 -cache 8. 
Enter fullscreen mode Exit fullscreen mode

This adds CPUs B, C, D, and E to the configuration, with state set to on, they will be started during boot.

 ./dps8 MR12.8_boot.ini 
DPS8/M simulator X3.0.1+16 (64-bit)
  Commit: 15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec
Enter fullscreen mode Exit fullscreen mode
bce (boot) 0817.2: M-> [auto-input] boot star

0817.9  CPU A: Model #: DPS 8/SIM M; Serial #: 0; Ship date: 240215; PROM Layout Version: 2; 
          Simulator Release: X3.0.1 (2024-02-15); Build Number: <None>;  
          Build Arch: aarch64; Build OS: Linux; 
          Target Arch: AArch64/ARM64/64-bit; Target OS: GNU/Linux.
CPU B thread created.
0817.9  start_cpu: Added CPU B.
CPU C thread created.
0817.9  start_cpu: Added CPU C.
Enter fullscreen mode Exit fullscreen mode

and it hangs...

Interestingly, this is not the symptom seen by the issue reporter, there hang is at the "CPU B thread created." message; they never see "0817.9 start_cpu: Added CPU B.".
(The "thread created" message is from the simulator; the messages starting with a time code are from Multics.)

$ gdb dps8 16473
(gdb) p/o cpus[0].PPR
$1 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[1].PPR
$2 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[2].PPR
$3 = {PRR = 0, PSR = 041, P = 01, IC = 0320127}
(gdb) p/o cpus[3].PPR
$4 = {PRR = 0, PSR = 0, P = 0, IC = 0}
Enter fullscreen mode Exit fullscreen mode

CPUs A and B are executing 34:2427

bound_interceptors                 34  (0, 0, 0) read execute privileged encacheable wired

Component                            Text        Int-Stat       Symbol
                                 Start Length  Start Length  Start Length

fim                                  0   2210      0      0    100    266
wired_fim                         2210    332      0      0    366    230
Enter fullscreen mode Exit fullscreen mode

34:2427 is offset 2427-2210 --> 217 in wired_fim:

                                   378  " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
                                   379  "
                                   380  "       START_WAIT - Wait until new CPU has started up.
                                   381  "
                                   382  "
                                   383  " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
                                   384
    000205                         385  start_wait:
    000205  aa   000001 3352 07    386          lca     1,dl            all ones in A
    000206  4a  4 00016 6753 20    387          era     prds$processor_pattern  turn off bit for this CPU
    000207  4a  4 00030 3553 20    388          ansa    scs$processor_start_wait  check ourselves off
                                   389
    000210  4a  4 00074 3737 20    390          eppsb   prds$           push a frame onto the prds
    000211  0a   000325 2272 00    391          ldx7    push            ..
    000212  4a  4 00076 7003 20    392          tsx0    fim_util$push_stack_32  ..
                                   393
    000213  aa  6 00050 3503 00    394          eppap   notify_regs     ap -> place to copy conditions
    000214  4a  4 00100 7003 20    395          tsx0    fim_util$copy_mc        copy the conditions into stack
                                   396
    000215  4a  4 00102 7003 20    397          tsx0    fim_util$set_mask       uninhibit to prevent lockups
    000216                         398          inhibit off     <-><-><-><-><-><-><-><-><-><-><-><->
                                   399
    000216  4a  4 00072 2341 20    400          szn     scs$connect_lock        test connect lock
    000217  0a   000223 6000 00    401          tze     *+4             wait until it is cleared
    000220  aa   000110 7770 00    402          llr     72
    000221  aa   000110 7770 00    403          llr     72
    000222  0a   000216 7100 00    404          tra     *-4
Enter fullscreen mode Exit fullscreen mode

The start CPU code holds connect_lock during new CPU startup; I am confused as to why both CPUs A (cpus[0]) and B (cpus[1]) are both in start_wait, but that may just be my not understanding how the running not-bootload CPUs work during CPU add.

CPU C (the newly added CPU) is at 320127

This seems to be

    000125  aa   000002 2352 07    158  swerr:  lda     rcerr_addcpu_bad_switches,dl
    000126  aa   000272 7552 04    159          sta     wait_flag-*,ic  set it for start_cpu
    000127  aa   077777 6372 03    160  swerr_lp:       ldt     =o77777,du      prevent timer runout faults
    000130  aa   000270 2352 04    161          lda     wait_flag-*,ic  has start_cpu given use a green lite?
    000131  aa   000043 6042 04    162          tmi     nogo-*,ic               no, bad switches go to DIS
    000132  aa   000002 1152 07    163          cmpa    rcerr_addcpu_bad_switches,dl is start_cpu still thinking about it?
    000133  aa   777774 6002 04    164          tze     swerr_lp-*,ic   yes, go through another loop
Enter fullscreen mode Exit fullscreen mode

Which makes no sense; the bootload CPU issued the CPU started message, which means CPU C had long ago passed the switch tests. Also, this is the same symptom that I was seeing yesterday.

Doing a instruction trace of CPU C:

2: 00041:030315 0 700026764161 (LPRP4 PR7|26,*AU) 000034 545(0) 1 0 0 00
2: 00041:030316 0 200001710100 (TRA PR2|1) 000000 764(0) 0 0 0 01
2: 00043:001033 0 000446710000 (TRA 000446) 000001 710(0) 1 0 0 00
2: 00043:000446 0 000002235120 (LDA PR0|2,N*) 000446 710(0) 0 0 0 00
2: 00043:000447 0 000007735000 (ALS 000007) 000214 235(0) 0 0 0 00
2: 00043:000450 0 000004035120 (ADLA PR0|4,N*) 000007 735(0) 0 0 0 00
2: 00043:000451 0 000003735000 (ALS 000003) 002473 035(0) 0 0 0 00
2: 00043:000452 0 000000620005 (EAX0 000000,AL) 000003 735(0) 0 0 0 00
2: 00043:000453 0 000006237120 (LDAQ PR0|6,N*) 000000 620(0) 0 0 0 05
2: 00043:000454 0 400132057120 (SSCR PR4|132,N*) 000120 237(0) 0 0 0 00
2: 00043:000455 0 700044710120 (TRA PR7|44,N*) 000000 057(0) 0 0 0 10
2: 00041:030325 0 600000373100 (EPBP7 PR6|0) 030325 710(0) 0 0 0 00
2: 030326 320050710200 (TRA 320050) 000000 373(0) 1 0 0 00
2: 320050 000346754204 (STI 000346,IC) 320050 710(0) 0 1 0 00
2: 320051 000345235204 (LDA 000345,IC) 000346 754(0) 0 1 0 04
2: 320052 000020315207 (CANA 000020,DL) 000345 235(0) 0 1 0 04
2: 320053 000120600204 (TZE 000120,IC) 000020 315(0) 0 1 0 07
2: 320054 320372674202 (LCPR 320372,QU) 000120 600(0) 0 1 0 04
2: 320055 000000623200 (EAX3 000000) 320372 674(0) 0 1 0 02
2: 320056 000000627200 (EAX7 000000) 000000 623(0) 0 1 0 00
Enter fullscreen mode Exit fullscreen mode

76K instructions in, it decides it wants to start executing init_processor code in ABS.

💖 💪 🙅 🚩
unused0
Charles Anthony

Posted on February 17, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related