Charles Anthony
Posted on February 17, 2024
A user reported a repeatable hang on a Raspberry Pi; I fired up my Pi 4 Model B and ran the latest simulator (15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec) with Quickstart 12.8, modified the Multics config, adding:
cpu -tag b -port 6 -state on -type dps8 -model 70. -cache 8.
cpu -tag c -port 5 -state on -type dps8 -model 70. -cache 8.
cpu -tag d -port 5 -state on -type dps8 -model 70 -cache 8.
cpu -tag e -port 4 -state on -type dps8 -model 70 -cache 8.
This adds CPUs B, C, D, and E to the configuration, with state set to on, they will be started during boot.
./dps8 MR12.8_boot.ini
DPS8/M simulator X3.0.1+16 (64-bit)
Commit: 15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec
bce (boot) 0817.2: M-> [auto-input] boot star
0817.9 CPU A: Model #: DPS 8/SIM M; Serial #: 0; Ship date: 240215; PROM Layout Version: 2;
Simulator Release: X3.0.1 (2024-02-15); Build Number: <None>;
Build Arch: aarch64; Build OS: Linux;
Target Arch: AArch64/ARM64/64-bit; Target OS: GNU/Linux.
CPU B thread created.
0817.9 start_cpu: Added CPU B.
CPU C thread created.
0817.9 start_cpu: Added CPU C.
and it hangs...
Interestingly, this is not the symptom seen by the issue reporter, there hang is at the "CPU B thread created." message; they never see "0817.9 start_cpu: Added CPU B.".
(The "thread created" message is from the simulator; the messages starting with a time code are from Multics.)
$ gdb dps8 16473
(gdb) p/o cpus[0].PPR
$1 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[1].PPR
$2 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[2].PPR
$3 = {PRR = 0, PSR = 041, P = 01, IC = 0320127}
(gdb) p/o cpus[3].PPR
$4 = {PRR = 0, PSR = 0, P = 0, IC = 0}
CPUs A and B are executing 34:2427
bound_interceptors 34 (0, 0, 0) read execute privileged encacheable wired
Component Text Int-Stat Symbol
Start Length Start Length Start Length
fim 0 2210 0 0 100 266
wired_fim 2210 332 0 0 366 230
34:2427 is offset 2427-2210 --> 217 in wired_fim:
378 " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
379 "
380 " START_WAIT - Wait until new CPU has started up.
381 "
382 "
383 " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
384
000205 385 start_wait:
000205 aa 000001 3352 07 386 lca 1,dl all ones in A
000206 4a 4 00016 6753 20 387 era prds$processor_pattern turn off bit for this CPU
000207 4a 4 00030 3553 20 388 ansa scs$processor_start_wait check ourselves off
389
000210 4a 4 00074 3737 20 390 eppsb prds$ push a frame onto the prds
000211 0a 000325 2272 00 391 ldx7 push ..
000212 4a 4 00076 7003 20 392 tsx0 fim_util$push_stack_32 ..
393
000213 aa 6 00050 3503 00 394 eppap notify_regs ap -> place to copy conditions
000214 4a 4 00100 7003 20 395 tsx0 fim_util$copy_mc copy the conditions into stack
396
000215 4a 4 00102 7003 20 397 tsx0 fim_util$set_mask uninhibit to prevent lockups
000216 398 inhibit off <-><-><-><-><-><-><-><-><-><-><-><->
399
000216 4a 4 00072 2341 20 400 szn scs$connect_lock test connect lock
000217 0a 000223 6000 00 401 tze *+4 wait until it is cleared
000220 aa 000110 7770 00 402 llr 72
000221 aa 000110 7770 00 403 llr 72
000222 0a 000216 7100 00 404 tra *-4
The start CPU code holds connect_lock during new CPU startup; I am confused as to why both CPUs A (cpus[0]) and B (cpus[1]) are both in start_wait, but that may just be my not understanding how the running not-bootload CPUs work during CPU add.
CPU C (the newly added CPU) is at 320127
This seems to be
000125 aa 000002 2352 07 158 swerr: lda rcerr_addcpu_bad_switches,dl
000126 aa 000272 7552 04 159 sta wait_flag-*,ic set it for start_cpu
000127 aa 077777 6372 03 160 swerr_lp: ldt =o77777,du prevent timer runout faults
000130 aa 000270 2352 04 161 lda wait_flag-*,ic has start_cpu given use a green lite?
000131 aa 000043 6042 04 162 tmi nogo-*,ic no, bad switches go to DIS
000132 aa 000002 1152 07 163 cmpa rcerr_addcpu_bad_switches,dl is start_cpu still thinking about it?
000133 aa 777774 6002 04 164 tze swerr_lp-*,ic yes, go through another loop
Which makes no sense; the bootload CPU issued the CPU started message, which means CPU C had long ago passed the switch tests. Also, this is the same symptom that I was seeing yesterday.
Doing a instruction trace of CPU C:
2: 00041:030315 0 700026764161 (LPRP4 PR7|26,*AU) 000034 545(0) 1 0 0 00
2: 00041:030316 0 200001710100 (TRA PR2|1) 000000 764(0) 0 0 0 01
2: 00043:001033 0 000446710000 (TRA 000446) 000001 710(0) 1 0 0 00
2: 00043:000446 0 000002235120 (LDA PR0|2,N*) 000446 710(0) 0 0 0 00
2: 00043:000447 0 000007735000 (ALS 000007) 000214 235(0) 0 0 0 00
2: 00043:000450 0 000004035120 (ADLA PR0|4,N*) 000007 735(0) 0 0 0 00
2: 00043:000451 0 000003735000 (ALS 000003) 002473 035(0) 0 0 0 00
2: 00043:000452 0 000000620005 (EAX0 000000,AL) 000003 735(0) 0 0 0 00
2: 00043:000453 0 000006237120 (LDAQ PR0|6,N*) 000000 620(0) 0 0 0 05
2: 00043:000454 0 400132057120 (SSCR PR4|132,N*) 000120 237(0) 0 0 0 00
2: 00043:000455 0 700044710120 (TRA PR7|44,N*) 000000 057(0) 0 0 0 10
2: 00041:030325 0 600000373100 (EPBP7 PR6|0) 030325 710(0) 0 0 0 00
2: 030326 320050710200 (TRA 320050) 000000 373(0) 1 0 0 00
2: 320050 000346754204 (STI 000346,IC) 320050 710(0) 0 1 0 00
2: 320051 000345235204 (LDA 000345,IC) 000346 754(0) 0 1 0 04
2: 320052 000020315207 (CANA 000020,DL) 000345 235(0) 0 1 0 04
2: 320053 000120600204 (TZE 000120,IC) 000020 315(0) 0 1 0 07
2: 320054 320372674202 (LCPR 320372,QU) 000120 600(0) 0 1 0 04
2: 320055 000000623200 (EAX3 000000) 320372 674(0) 0 1 0 02
2: 320056 000000627200 (EAX7 000000) 000000 623(0) 0 1 0 00
76K instructions in, it decides it wants to start executing init_processor code in ABS.
Posted on February 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.