[getdns-users] bindata string encoding? (was: Re: [PATCH] remove GETDNS_COMPILATION_COMMENT)

Thu Jul 9 18:42:59 UTC 2015

Robert Edmonds wrote:
> Looking at the version-related stuff in http://www.vpnc.org/getdns-api/,
> I'm a bit confused.  The 'version_string' field in the dict returned by
> getdns_context_get_api_information() is defined as:
> 
>     version_string (a bindata) represents the version string for this
>     version of the DNS API.

Hi,

I wrote a C program to retrieve the version_string bindata from a getdns
context object, and then I ported it to Python (see attached).  I was
expecting these programs to print equivalent output, since the "bindata"
type holds arbitrary binary data:

    bindata is a struct to hold binary data. It is defined as { size_t
        size; uint8_t *binary_stuff; }.

However, I got different output from the C and Python versions:

    $ ./getdns_version   
    version_string = (6 bytes) '0.2.0\x00'
    $ ./getdns_version.py
    version_string = (5 bytes) '0.2.0'
    $ 

It looks like the getdns library intentionally appends a NUL byte to the
bindata value when it converts a string:

    https://github.com/getdnsapi/getdns/blob/v0.2.0/src/util-internal.c#L86

Note the "strlen(value) + 1", which is one more byte than the length of
the string (excluding the terminating NUL byte).  So the bindata
includes a NUL byte at the end.

And it looks like Python getdns library binding then assumes this
bindata value is NUL-terminated and truncates the sequence at the first
NUL byte:

    https://github.com/getdnsapi/getdns-python-bindings/blob/v0.3.1/context.c#L630

Note the use of PyString_FromString rather than
PyString_FromStringAndSize.

I think this is backwards: if you have a byte sequence and an explicit
length, this allows for embedded NUL bytes, and you should use the
explicit length rather than assuming the byte sequence is a C-style
string and truncating it at the first NUL byte (or, worse, performing an
out-of-bounds read if it turned out this assumption was incorrect and
the sequence didn't contain a NUL byte).

So, I would recommend that the getdns C library should set the bindata
'size' field to strlen(value) rather than strlen(value)+1 when it
converts a C string to a getdns bindata, and the getdns Python C library
should use PyString_FromStringAndSize, with the len argument to that
function set to the bindata 'size' field.  (I couldn't find anything one
way or the other in the getdns API spec that describes how strings
should be packed into bindata's, though.)

This is slightly inconvenient for C code (you can't pretend the uint8_t*
is a char* and just print it), but most languages for which you would
want to write bindings for the C library don't use C-style strings and
would need to explicitly account for how to handle a trailing NUL byte,
if any.

Sometimes confusion as to how a string is encoded (explicit length vs
NUL-termination) results in security vulnerabilities, e.g. see "Null
Prefix Attacks Against SSL/TLS Certificates" for an infamous example:

    http://www.thoughtcrime.org/papers/null-prefix-attacks.pdf

-- 
Robert Edmonds
edmonds at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getdns_version.c
Type: text/x-csrc
Size: 1119 bytes
Desc: not available
URL: <http://lists.getdnsapi.net/pipermail/users/attachments/20150709/e4165225/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getdns_version.py
Type: text/x-python
Size: 283 bytes
Desc: not available
URL: <http://lists.getdnsapi.net/pipermail/users/attachments/20150709/e4165225/attachment.py>