[getdns-users] bindata string encoding?

Thu Jul 9 20:15:32 UTC 2015

Melinda Shore wrote:
> On 7/9/15 10:42 AM, Robert Edmonds wrote:
> > I think this is backwards: if you have a byte sequence and an explicit
> > length, this allows for embedded NUL bytes, and you should use the
> > explicit length rather than assuming the byte sequence is a C-style
> > string and truncating it at the first NUL byte (or, worse, performing an
> > out-of-bounds read if it turned out this assumption was incorrect and
> > the sequence didn't contain a NUL byte).
> 
> I'm kind of "meh" on that - I'm not sure that it's reasonable to
> assume the possible presence of a 0 byte in the middle of something
> that's agreed to be a C-format string.

Hi, Melinda:

I agree, if a field is defined to be a C-style NUL-terminated string,
then by definition it ends at the first \0 byte and the string cannot
contain embedded NULs.  But the version of the spec I'm looking at
(https://getdnsapi.net/spec.html) only says that the 'version_string'
bindata field represents a "string", without specifying how the string
is encoded.  My confusion comes about because the string is passed
through an interface (the getdns_bindata type) that also passes an
explicit length.

> There are definitely places we're punting the bounds-checking to the
> Python libraries and that may not be reasonable.

Yeah, AFAICT, the Python binding is just passing the bindata 'data'
field to PyString_FromString(), which then calls strlen() on it.  So
there's no bounds-checking at all, it's relying on the 'data' field to
be NUL-terminated.  That's why I recommend explicitly relying on the
bounds information that the bindata type provides :-)

-- 
Robert Edmonds
edmonds at debian.org