[getdns-api] async comments (0.268)

Tue Feb 5 18:59:19 MST 2013

On Feb 4, 2013, at 7:04 AM, Dan Winship <dan.winship at gmail.com> wrote:

> Some comments from the point of view of thinking about reimplementing
> GLib's getaddrinfo()-in-threads-based resolver
> (http://developer.gnome.org/gio/stable/GResolver.html) with one based on
> getdns()...
> 
> 
>> The callback function might be called at any time, even before
>> getdns() has returned.
> 
> Our experience in GNOME has been that this tends to lead to bugs; the
> code after the getdns() call has to deal with two possible states of
> the world (eg, the userarg data may or may not have been freed), and
> so code like:
> 
>  status = getdns (context, name, GETDNS_RRTYPE_A, NULL,
>                   myuserdata, &transaction_id, mycallback);
>  myuserdata->id = transaction_id;
> 
> would be wrong, but not obviously so (and it might actually work 100%
> of the time with one getdns implementation, but fail sporadically with
> others). It's not much harder for the getdns() implementation to just
> guarantee that it won't invoke the callback until after you return to
> the event loop, and then you protect the caller from that class of
> bugs.

Actually, it is a lot harder to make that guarantee: it forces the API implementation to hold off calling the callback until long after the "natural" time would have. This means that the API implementation has to do a bit of pseudo-asych of its own, which seems error-prone across different async libraries.

Your example of what a naive programmer might do is valid, but there are a lot of bad things that a bad programmer might do that we cannot prevent (well, short of making everything synchronous, which is a non-starter). I *think* the only problems might be the assumption that &transaction_id was filled it, yes? That is, getdns() doesn't change anything in userarg, so there is no issue if the application mucks with it between the time getdns was called and the callback.

>> getdns_cancel_callback() may return immediately, even before the
>> callback finishes its work and returns.
> 
> As above, the "may" makes things messy; it should either always call
> the callback itself before returning, or always just schedule the
> callback to be called upon returning to the event loop.

I agree with Matt: the user is not guaranteed anything until the callback is invoked. Asking the API implementation to "schedule the callback" is error-prone and doesn't buy even a sloppy programmer anything, does it? That is, if the application falsely assumes that callback is finished, what would it do differently, given that the callback is cleaning up memory because it was cancelled?

>> Each implementation of the DNS API will specify an extension function
>> that tells the DNS context which event base is being used.
> 
> This seems inconvenient for everyone involved except the API
> specification author. :-)

Not at all: it is also inconvenient for me. Didn't you notice all the handwaving later about "depending on which async library is used"? That wasn't convenient at all. :-)

> getdns() implementation authors (which, in the long run really means
> "libc/libresolv maintainers")

Stop right there. There is *no* assumption that this will be be part of libc or libresolv any time soon. This is an API proposal that might or might not become popular, just like the other modern DNS APIs. 

> don't want to have to know about every
> possible event loop implementation. (And they can't anyway, and even
> if they did, they'd have no good way to integrate with non-C-based
> ones.)

That's exactly right: they don't want to, and can't. If there was only one async library that I had to worry about, you would be correct, but it is very clear different people want to use different async libraries. This leaves four choices:
a) I pick one and ignore the users of all other async libraries
b) I make a generic hole for those libraries and try to shoehorn every possible library's calls into that hole
c) I don't do async
d) I leave it up to the implementer, who will certainly hear from the application developers about which libraries they want supported
I chose (d). 

> Event loop implementation authors don't want to have to worry about
> getting every getdns() implementation to support them, and don't want
> to have to write N different integration thingies for N different
> getdns() implementations.
> 
> getdns() users don't want to have to write:
> 
>  #if defined (HAVE_GETDNS_EXTENSION_SET_LIBEVENT_BASE)
>      getdns_extension_set_libevent_base (context);
>  #elif defined (HAVE_GETDNS_EXTENSION_SET_EVENTBASE_FOR_LIBEVENT)
>      getdns_extension_set_eventbase_for_libevent (context);
>  #else
>  #error Don't know how to set up getdns() on this platform
>  #endif

Fully agree: they don't want that. But they have it today anyhow.

> There needs to just be a standard part of the API that can be used to
> register any event loop with any getdns() implementation. (Or at
> least, there needs to be an API that any unix event loop
> implementation can use, and an API that any Windows event loop
> implementation can use, etc.)

It sounds like you want (b) that slouches into (a). That seems more limiting than the #if tangle. I chose "I know which event library I am using" over "I don't really use any event library, I rolled my own with polling".

> On unixy platforms, all getdns() implementations are going to be based
> on sockets and timeouts, and all event loops are going to be based on
> poll() or something equivalent.

Err, no. There are many more choices than that. In fact, today, I suspect that many more applications use { libevent | libev } instead of polling.

> I don't know how people expect async stuff to work on Windows, so I'm
> not sure what the API would have to look like there. It might involve
> replacing all the fd and pollfd args with HANDLEs.

Or it might use a real event library. :-)

>> 1.5 Calling the API Synchronously (Without Events)
> 
> I think the getdns_sync_request() API probably makes sense the way it
> is, and would be useful to lots of people, but just for the record, the
> fact that it isn't cancellable means we wouldn't be able to use it to
> implement sync lookups in GLib. But making it cancellable would imply
> making it thread-safe too (since another thread would be the only
> place you could be cancelling from), so you probably don't want to go
> there.

Exactly right. Others might want to assume thread safety, but my reading of the past decade's worth of discussion for most OSs makes that seem unnecessarily risky.

> (And anyway, we can fake a cancellable synchronous lookup by doing an
> asynchronous lookup in a temporary event loop.)

Yes. I am definitely assuming that an application who has a general model for how they want to do event-y things if they are *not* using a real event library will be able to fake this up their own preferred way just fine.

--Paul Hoffman