Sherlock and Symtab Library

Sherlock, and the Symtab library upon which it is based, are intended as diagnostic tools to aid software development on RISC OS, particularly for debugging awkward failures on released software/OS builds.

The linker included in the Norcroft toolchain has always had the capacity to emit a list of symbol addresses/offsets alongside the generated binary code and this can be invaluable when mapping the raw address of a failure into a location within the source/data of a program. This has always been a laborious manual process, however, which the Symtab library aims to simplify greatly.

The library concerns itself with the loading, parsing and searching of a number of symbol tables, potentially one for each and every loaded RISC OS relocatable module, including the ROM modules, and one for the application code itself. Then, from this set of symbol tables is may be used to look up the address of a failing instruction/memory access and map that address into a symbol triplet of 'module name', 'symbol name' and 'address offset.' Without additional debugging information built into the failing binary, this is about as far as the mapping can really proceed, but it is usually enough to locate the line/datum in question very easily.

Symtab is deliberately designed to be self-contained using only a bare minimum of ISO C run-time library functions, with callbacks into the client code for any required complex functionality, so that it may easily be incorporated into very low-level code and used+ in circumstances when the C runtime may not be (fully) usable.

Sherlock is a RISC OS relocatable module that employs the Symtab library to provide a number of *commands which are rough parallels of those provided by the standard Debugger module, with the addition of symbolic information:

  • *SymLoad <file | dir> [<file | dir> ...]
  • *SMemory [B | H | D] <addr1 | reg1> [[+|-] <addr2 | reg2>]
  • *SMemoryI ...<addr1 | reg1> [[+|-] <addr2 | reg2>]
  • *SMemoryS <mode | addr1 | reg1> [[+|-] <addr2 | reg2>]
  • *SymTables [<table> [<table>...]]

Example usage of the above commands:

Load a symbol file called 'tsym' into memory, naming the table 'test' (in this case an application).

*SymLoad test##tsym
Loaded symbol tables for test from 'tsym'

List information on the symbol table(s) that have been loaded. Information on specific tables, rather than all loaded tables, may be requested by specifying the table name(s) on the command-line. A '-V' parameter lists verbose information including all of the symbols within the table.

*SymTables
Symbol table 'test' at c90d9dec
  3 blocks
 Block 0 (ReadOnly): 0x8080 to 0xCDF4
  476 symbols
 Block 1 (ReadWrite): 0xCDF4 to 0xCE70
  6 symbols
 Block 2 (ZeroInit): 0xCE70 to 0xDF74
  77 symbols

To produce a symbolic disassembly of an address range, the *SMemoryI command parallels the standard *MemoryI command of the Debugger module, supporting the same syntax, but the first address may also be symbolic, as showing in the second example below.

*smemoryi 817c + 40
0000817C : E3550000 : file_close+0x48                  : CMP     R5,#0
00008180 : 11A00005 : file_close+0x4C                  : MOVNE   R0,R5
00008184 : E91BA870 : file_close+0x50                  : LDMDB   R11,{R4-R6,R11,R13,PC}
00008188 : 0000CC00 : file_close+0x54                  : ANDEQ   R12,R0,R0,LSL #24
0000818C : 00000066 : file_close+0x58                  : ANDEQ   R0,R0,R6,RRX
00008190 : 656C6966 : file_close+0x5C                  : STRVSB  R6,[R12,#-2406]!
00008194 : 7465675F : file_close+0x60                  : STRVCBT R6,[R5],#-1887
00008198 : 00000073 : file_close+0x64                  : ANDEQ   R0,R0,R3,ROR R0
0000819C : FF00000C : file_close+0x68                  : Undefined instruction
000081A0 : E1A0C00D : file_gets                        : MOV     R12,R13
000081A4 : E92D000F : file_gets+0x4                    : STMDB   R13!,{R0-R3}
000081A8 : E92DDBF0 : file_gets+0x8                    : STMDB   R13!,{R4-R9,R11,R12,R14,PC}
000081AC : E24CB014 : file_gets+0xC                    : SUB     R11,R12,#
000081B0 : E15D000A : file_gets+0x10                   : CMP     R13,R10
000081B4 : 4B000FB5 : file_gets+0x14                   : BLMI    __rt_stkovf_split_small
000081B8 : E1B04001 : file_gets+0x18                   : MOVS    R4,R1
*smemoryi area_names + 40
0000CE20 : 00008454 : area_names                       : ANDEQ   R8,R0,R4,ASR R4
0000CE24 : 00008460 : area_names+0x4                   : ANDEQ   R8,R0,R0,ROR #8
0000CE28 : 0000846C : area_names+0x8                   : ANDEQ   R8,R0,R12,ROR #8
0000CE2C : 6E6E553C : area_names+0xC                   : MCRVS   CP5,3,R5,C14,C12,1
0000CE30 : 64656D61 : area_names+0x10                  : STRVSBT R6,[R5],#-3425
0000CE34 : 3028203E : area_names+0x14                  : EORCC   R2,R8,R14,LSR R0
0000CE38 : 58585878 : area_names+0x18                  : LDMPLDA R8,{R3-R6,R11,R12,R14}^
0000CE3C : 58585858 : area_names+0x1C                  : LDMPLDA R8,{R3,R4,R6,R11,R12,R14}^
0000CE40 : 00002958 : area_names+0x20                  : ANDEQ   R2,R0,R8,ASR R9    ; *** Not R8-R14
0000CE44 : 00000000 : area_names+0x24                  : ANDEQ   R0,R0,R0
0000CE48 : 00000000 : area_names+0x28                  : ANDEQ   R0,R0,R0
0000CE4C : 00000000 : test_handle                      : ANDEQ   R0,R0,R0
0000CE50 : 0000B2A0 : fn                               : ANDEQ   R11,R0,R0,LSR #5
0000CE54 : 0000B390 : fn+0x4                           : Undefined instruction
0000CE58 : 00008090 : fn+0x8                           : Undefined instruction
0000CE5C : 00008134 : fn+0xC                           : ANDEQ   R8,R0,R4,LSR R1

To load the symbols for a relocatable module, it is useful to specify the name of the module as a prefix to the filename. For example, the following command instructs the Sherlock module to load its own symbol table from the file 'sym' When the table is loaded, Sherlock will check for the presence of a loaded Relocatable Module with the given name, and thus maps the offsets specified in the symbol file into absolute addresses. It will also do this if the module is later loaded/reloaded, so the order in which the table and the module itself are loaded does not matter.

*SymLoad Sherlock##sym
Setting base of 539800212 'ReadOnly' to 202CB294
Setting base of 539431508 'ReadWrite' to 20271254
Setting base of 539431712 'ZeroInit' to 20271320
Loaded symbol tables for Sherlock from 'sym'

The *SMemoryS command provides a crude backtrace/dump of the given stack/address range. If a CPU mode is specified on the command-line, rather than an address range as for the other *commands, the current stack pointer for that mode is read and used as the start address. Here is part of the output produced when *SMemoryS is called for the SVC stack, and we can see that the Sherlock module is itself threaded and its addresses appear on the Supervisor stack because it is processing the *command. Clearly this is of limited utility at present, and requires a SWI/lower-level interface to achieve its true potential.

*SMemoryS SVC
FA207F40 : 202745BD ->                                  : .E'
FA207F44 : FA208000 ->                                  : .. .
FA207F48 : FA207F40 ->                                  : @. .
FA207F4C : 202745BD ->                                  : .E'
FA207F50 : 00000001 ->                                  : ....
FA207F54 : 00000003 ->                                  : ....
FA207F58 : FB407C0C ->                                  : .|@.
FA207F5C : 23F60D4C ->                                  : L..#
FA207F60 : 23DBFF9C ->                                  : ...#
FA207F64 : 00000003 ->                                  : ....
FA207F68 : FFFFFFFF ->                                  : ....
FA207F6C : 00000000 ->                                  : ....
FA207F70 : 00000000 ->                                  : ....
FA207F74 : FA207F80 ->                                  : .. .
FA207F78 : 202CB478 -> Sherlock##__module_header+0x1E4  : x.,
FA207F7C : 202CE120 -> Sherlock##module_command+0xC     :  .,
FA207F80 : 202745BD ->                                  : .E'
FA207F84 : 00000053 ->                                  : S...
FA207F88 : FB407BF4 ->                                  : .{@.
FA207F8C : FC02389C ->                                  : .8..
FA207F90 : FFFFFFFF ->                                  : ....
FA207F94 : 202CB3F0 -> Sherlock##__module_header+0x15C  : ..,
FA207F98 : 202745B4 ->                                  : .E'
FA207F9C : 00000110 ->                                  : ....
...
*smemoryi file_close
202CB7D8 : E1A0C00D : file_close                       : MOV     R12,R13
202CB7DC : E92DD873 : file_close+0x4                   : STMDB   R13!,{R0,R1,R4-R6,R11,R12,R14,PC}
202CB7E0 : E24CB004 : file_close+0x8                   : SUB     R11,R12,#4
202CB7E4 : E15D000A : file_close+0xC                   : CMP     R13,R10
202CB7E8 : 4B00190F : file_close+0x10                  : BLMI    __rt_stkovf_split_small
202CB7EC : E1B06001 : file_close+0x14                  : MOVS    R6,R1
202CB7F0 : E1A04000 : file_close+0x18                  : MOV     R4,R0
202CB7F4 : 059F1030 : file_close+0x1C                  : LDREQ   R1,file_close+0x54
202CB7F8 : 024F2F11 : file_close+0x20                  : ADREQ   R2,file_open+0x88
202CB7FC : 028F0F0B : file_close+0x24                  : ADREQ   R0,file_close+0x58
202CB800 : 03A0303D : file_close+0x28                  : MOVEQ   R3,#
202CB804 : 0B001AB3 : file_close+0x2C                  : BLEQ    __assert2
202CB808 : E5960000 : file_close+0x30                  : LDR     R0,[R6,#0]
202CB80C : EB0016E4 : file_close+0x34                  : BL      xosfind_closew
202CB810 : E1A05000 : file_close+0x38                  : MOV     R5,R0
202CB814 : E1A01006 : file_close+0x3C                  : MOV     R1,R6
202CB818 : E1A00004 : file_close+0x40                  : MOV     R0,R4
202CB81C : EB00014A : file_close+0x44                  : BL      mem_free
202CB820 : E3550000 : file_close+0x48                  : CMP     R5,#0
202CB824 : 11A00005 : file_close+0x4C                  : MOVNE   R0,R5
202CB828 : E91BA870 : file_close+0x50                  : LDMDB   R11,{R4-R6,R11,R13,PC}
202CB82C : 202D279C : file_close+0x54                  : MLACS   R13,R12,R7,R2
202CB830 : 00000066 : file_close+0x58                  : ANDEQ   R0,R0,R6,RRX
202CB834 : 656C6966 : file_close+0x5C                  : STRVSB  R6,[R12,#-2406]!

Exception logging

Sherlock installs handlers on all of the processor exception vectors so that it can catch and log invalid memory accesses, attempts to execute Undefined Instructions etc. When an exception occurs, Sherlock writes the contents of the registers, stack and code disassembly/source into its internal log buffer, which may subsequently be streamed out to disk for a more permanent record of the failure.

A snipping from the log output produced by Sherlock in response to a Data Abort exception occurring within its own code is shown below. Looking at the log we can immediately see from the FAR (Fault Address Register), register context and disassembly that the problem is an invocation of strlen(NULL) and looking at the stack dump that follows the disassembly, we see the return addresses indicating a call to printf() which is invoking strlen, and that the printf() call occurs within the dump_bintree() function used by dump_table()/list_symbol_tables(), which is indeed what happened; the crash occurred when dumping -V(erbose) information on a loaded symbol table.

Data Abort at &20225848

Register dump (stored at &C94DEFB4) is:
R0  = E59FF464 ->
R1  = FA207EFC ->
R2  = E59FF464 ->
R3  = FA207EFC ->
R4  = 00000000 ->
R5  = 00000000 ->
R6  = FA207EEC ->
R7  = 00000008 ->
R8  = 2022FE45 -> Sherlock##dump_bintree+0x99
R9  = FFFFFFFF ->
R10 = FFFFFFFF ->
R11 = 202307B8 -> Sherlock##out_chars
R12 = 00000008 ->
R13 = FA207EBC ->
R14 = 2022BC30 -> Sherlock##rts_intern_printf+0x73
R15 = 20225848 -> Sherlock##strlen+0x8
Mode SVC32 flags set: NzCvqjggggeAift
PSR = A0000113
FAR = E59FF464 FSR = 00000005

    ...
    2022582C : .`.. : 85C36001 : Sherlock##memcpy+0x85C           : STRHIB  R6,[R3,#1]
    20225830 : .`.. : E49D6004 : Sherlock##memcpy+0x860           : LDR     R6,[R13],#4
    20225834 : .P.. : E49D5004 : Sherlock##memcpy+0x864           : LDR     R5,[R13],#4
    20225838 : .@.. : E49D4004 : Sherlock##memcpy+0x868           : LDR     R4,[R13],#4
    2022583C : .... : E49DF004 : Sherlock##memcpy+0x86C           : LDR     PC,[R13],#4
--> 20225840 : .... : E590C000 : Sherlock##strlen                 : LDR     R12,[R0,#0]
    20225844 : .0.. : E3A03001 : Sherlock##strlen+0x4             : MOV     R3,#1
    20225848 : .... : E1A01000 : Sherlock##strlen+0x8             : MOV     R1,R0
    2022584C : .4.. : E1833403 : Sherlock##strlen+0xC             : ORR     R3,R3,R3,LSL #8
    20225850 : . .. : E2102003 : Sherlock##strlen+0x10            : ANDS    R2,R0,#3
    20225854 : .8.. : E1833803 : Sherlock##strlen+0x14            : ORR     R3,R3,R3,LSL #16
    2022585C : . \. : E05C2003 : Sherlock##strlen+0x1C            : SUBS    R2,R12,R3
    ...

    ...
    FA207E94 : .F.. : FC1B46EC -> Debugger##Code+0x514
    FA207E98 : .\#  : 20235CB0 -> Sherlock##mem_blocks+0x300C
    FA207E9C : .... : FC020610 ->
    FA207EA0 : ...` : 60000193 ->
    FA207EA4 : .... : 00060380 ->
    FA207EA8 : .N"  : 20224E98 -> Sherlock##xdebugger_disassemble_
    FA207EAC : |.#  : 20231D7C -> Sherlock##barmenu_defn+0xC
    FA207EB0 :  ... : 00000020 ->
    FA207EB4 :  .M. : C94DEF20 ->
    FA207EB8 : 0."  : 2022BC30 -> Sherlock##rts_intern_printf+0x73
--> FA207EBC : d... : E59FF464 ->
    FA207EC0 : .... : 00000000 ->
    FA207EC4 : .... : 00000001 ->
    FA207EC8 : .... : 00000000 ->
    FA207ECC : .... : 00000000 ->
    FA207ED0 : .Z"  : 20225A0C -> Sherlock##printf
    FA207ED4 : .... : 00000003 ->
    FA207ED8 : ..R. : C9520FA8 ->
    FA207EDC : ..R. : C9520FDC ->
    FA207EE0 : T... : 00000054 ->
    FA207EE4 : .... : 00000002 ->
    FA207EE8 : ..#  : 20230894 -> Sherlock##_printf+0x24
    FA207EEC : .. . : FA207F00 ->
    FA207EF0 : .."  : 2022FDD8 -> Sherlock##dump_bintree+0x2C
    FA207EF4 : <."  : 2022FE3C -> Sherlock##dump_bintree+0x90
    FA207EF8 : .... : E28FD0D8 ->
    FA207EFC : d... : E59FF464 ->
    FA207F00 : .."  : 2022FD96 -> Sherlock##out_spaces+0x46
    FA207F04 : .... : 00000000 ->
    FA207F08 : t2.  : 20003274 -> FPEmulator##Work
    FA207F0C : .Z"  : 20225A0C -> Sherlock##printf
    FA207F10 : .... : 00000000 ->
    FA207F14 : .."  : 2022FFF0 -> Sherlock##dump_table+0x188
    FA207F18 : ..R. : C9520FA8 ->
    FA207F1C : .Z"  : 20225A0C -> Sherlock##printf
    FA207F20 : .... : 00000001 ->
    FA207F24 : .m#  : 20236DAB -> Sherlock##emuda_handler+0x27
    FA207F28 : .... : 00000000 ->
    FA207F2C : .m#  : 20236DA1 -> Sherlock##emuda_handler+0x1D
    FA207F30 : .... : 00000001 ->
    FA207F34 : .... : 00000001 ->
    FA207F38 : .... : 00001000 ->
    FA207F3C : .... : 00000004 ->
    FA207F40 : .."  : 202294A8 -> Sherlock##direct_out
    FA207F44 : D."  : 20229D44 -> Sherlock##list_symbol_tables+0x8
    FA207F48 : .... : 00000001 ->
    FA207F4C : .... : 00000002 ->
    ...

ARM Symbolic Debugging Format

Although the original goal of the Sherlock module was to assist with diagnosis and debugging of released code that contains no debugging information, it also supports application/module images that have been built with ASD-format Debug information included. This permits viewing code at the level of C/assembler source, greatly easing the task of locating a failure within the source code.

Sherlock will automatically spot when an application/module containing ASD information is loaded and will retain a copy of that information in case of a subsequent failure. In subsequent exception logs, or in the output of *commands such as *SMemoryI program source can be displayed alongside addresses for which ASD information is available as illustrated below:

204e8aa8 : Sherlock##module_init+0x8   : SUB     R13,R13,#4           :
204e8aac : Sherlock##module_init+0xC   : MOV     R1,R13               :   err = mem_init(max_size, &mem_base);
204e8ab0 : Sherlock##module_init+0x10  : MOV     R0,#&02000000        :
204e8ab4 : Sherlock##module_init+0x14  : BL      mem_init             :
204e8ab8 : Sherlock##module_init+0x18  : MOVS    R6,R0                :   if (!err)
204e8abc : Sherlock##module_init+0x1C  : BNE     module_init+0x98     :   {
204e8ac0 : Sherlock##module_init+0x20  : LDR     R0,[R13,#0]          :
204e8ac4 : Sherlock##module_init+0x24  : MOV     R1,#&00040000        :      err = log_init(mem_base + MEM_LOG_OFFSET, LOG_MAX...
204e8ac8 : Sherlock##module_init+0x28  : ADD     R0,R0,#&6000         :
204e8acc : Sherlock##module_init+0x2C  : BL      log_init             :
204e8ad0 : Sherlock##module_init+0x30  : MOVS    R6,R0                :      if (!err)
204e8ad4 : Sherlock##module_init+0x34  : BNE     module_init+0x94     :      {
204e8ad8 : Sherlock##module_init+0x38  : BL      symtab_init          :         err = symtab_init();
204e8adc : Sherlock##module_init+0x3C  : MOVS    R6,R0                :         if (!err)
204e8ae0 : Sherlock##module_init+0x40  : BNE     module_init+0x90     :         {
204e8ae4 : Sherlock##module_init+0x44  : MOV     R0,R4                :           err = process_symbols_cmd(cmd_tail);
204e8ae8 : Sherlock##module_init+0x48  : BL      process_symbols_cmd  :
204e8aec : Sherlock##module_init+0x4C  : MOVS    R6,R0                :           if (err)
204e8af0 : Sherlock##module_init+0x50  : BEQ     module_init+0x5C     :
204e8af4 : Sherlock##module_init+0x54  : BL      symtab_fin           :             (void)symtab_fin();

Memory Dumps

The Sherlock module has the ability to save the memory/state of the system to a 'dump' file which may then be studied later by redirecting all of Sherlock's *commands to operate upon the file rather than the live system upon which Sherlock is running. This facility makes it possible to capture a (possibly intermittent) failure occurring on the user's machine and then study it on the remote machine of the software developer.

*SMemSave <filename> captures the current state of processor registers, memory contents, loaded symbol tables and source code etc. The result is a single 'memory dump' file which may then be loaded into Sherlock on any machine using the *SMemLoad <filename> command.

To unload a memory dump from Sherlock and return the *commands to operation upon the live system simply issue the *SMemLoad command without a filename parameter.

Using Sherlock now

An in-progress development build - binary only for now, whilst I continue working on the source and tidying a few loose ends - may be downloaded here

For module code that fails but leaves the system sufficiently usable that *commands may still be entered, it should be simple to use the Sherlock module even in its current nascent state of development, since the code will necessarily already be in memory.

To investigate a fault induced within application code will currently require manual loading of the symbol table and application binary into memory, eg. from a TaskWindow, issue *SymLoad <symbol file>, followed by *Load <executable image>, bearing in mind that the binary will not be executed, and must thus be a raw (not compressed/encrypted) copy of the in-memory executable at the point of failure. If you have an utility that will produce a copy of the application memory at the point of failure, or grab the memory contents using your favourite source editor, then you may choose to load that instead using a similar *Load command.

Future Development

A possibly future extension to Sherlock may be to introduce a SWI/direct interface to the routines which perform these operations. It could also be beneficial to introduce calls from the ZeroPain module into Sherlock or the underlying Symtab library, so that non-faulted accesses to zero page may be logged in a symbolic form whilst the application continues running.

Please get in touch if you have any suggestions for further development of the Sherlock module or the underlying Symtab library, to make it more useful as a diagnostic/development tool. In due course it is my intention to release all of the code as open source for the benefit of all developers, and so that the library may readily be incorporated into other tools.


Copyright © Adrian Lees 2015