[getdns-api] Error handling in getdns

Fri Jan 8 11:17:47 UTC 2016

On 7.1.2016 17:25, Gowri V wrote:
> I strongly agree to this proposed change and have added some comments
> inline below...
> 
> Also, I would like to propose something analogous to the errno or the
> Windows function GetLastError()
> where if the user desires, they can get a more detailed description, either
> as a string or a dict, in addition to the below.
> This would be useful diagnostics for a developer if any of the lower level
> socket API calls fail for any reason, currently some of these failures
> return a generic error.
> 
> Gowri
> 
> 
> On Thu, Jan 7, 2016 at 8:21 AM, Sara Dickinson <sara at sinodun.com> wrote:
> 
>> Hi All,
>>
>> Willem and I have been discussing the error handling in getdns and would
>> like to propose some changes based on our implementation experience.
>>
>> 1) Asynchronous error handling
>>
>> At the moment the async callbacks are specified as below:
>>
>> GETDNS_CALLBACK_COMPLETE
>> The response has the requested data in it
>> GETDNS_CALLBACK_CANCEL
>> The calling program cancelled the callback; response is NULL
>> GETDNS_CALLBACK_TIMEOUT
>> The requested action timed out; response is filled in with empty structures
>> GETDNS_CALLBACK_ERROR
>> The requested action had an error; response is NULL
>>
>>
>> With this approach there is no mechanism to provide any more fine grained
>> information to the user when an ERROR is return because the response in the
>> callback is NULL.
>> When considering a mainly UDP based approach this is probably sufficient,
>> but to cater for TCP and TLS (with authentication) we would like to change
>> the above so that a variety of errors that can occur that are not timeouts
>> can be communicated to the caller.
>>
>> We would like to propose that the ERROR case is changed to have the same
>> response as the TIMEOUT case i.e:
>>
>> GETDNS_CALLBACK_ERROR
>> The requested action had an error; response is filled in with empty
>> structures
>>
>> And so the response structure would look similar to this example for a
>> TIMEOUT:
>>
>> {
>>   "answer_type": GETDNS_NAMETYPE_DNS,
>>   "replies_full": [],
>>   "replies_tree": [],
>>   "status": GETDNS_RESPSTATUS_ALL_TIMEOUT
>> }
>>
>> 2) New GETDNS_RESPSTATUS codes
>>
>> The new GETDNS_RESPSTATUS_ error cases we would like to add at this time
>> are:
>>
>> TRANSPORT_SETUP_FAILED - for the case where no connection could be made
>> over any specified transport to any upstream (for example, only TLS is
>> specified but none of the available upstreams support it).
>>
>> TLS_FEATURE_NOT_SUPPORTED  - for the cases where getdns can’t support the
>> configured transport/authentication options at runtime because the
>> available TLS library doesn’t have the required functionality (for example
>> support for TLS 1.2 or hostname verification methods)).
>>
> 
> 
> I suggest even more detailed reporting for the above, eg. TLS_FEATURE_NOT
> _SUPPORTED_TLS12_NOT AVAILABLE and other errors where possible.
> 
> 
> 
>>
>> TLS_AUTH_FAILED  - for the case when using TLS only and authentication is
>> required but fails. This is strictly a sub-case of TRANSPORT_SETUP_FAILED
>> but seems worthy of a separate status code.
>>
>> 3) Synchronous timeouts
>>
>> When calling the API synchronously, the return type of the functions is
>> getdns_return_t. There is currently no value for GETDNS_RETURN_TIMEOUT and
>> the behaviour for the sync calls is not clearly specified for a timeout in
>> the spec. So our implementation currently uses GETDNS_RETURN_GOOD (and
>> returns the response dict as in 1 above), the best alternative error code
>> would be GETDNS_RETURN_GENERIC_ERROR. So we would like to propose adding
>> the value GETDNS_RETURN_TIMEOUT to the getdns_return_t type.
>>
>> As a future activity we note that the above mechanisms can only relay a
>> single error code. Since completing an API call can involve
>> - performing multiple DNS queries
>> - using multiple upstreams
>> - using multiple transports
>> - TLS authentication that can fail for various reasons
>> - DNSSEC validations
>> - TSIG validation
>> we are considering adding an ‘error log trail’ utility that would be
>> recorded during execution and could be returned in the response dict.
>> Feedback on this is welcomed.
>>
> 
> 
> The error log trail utility is a good design pattern to implement for
> getdns, perhaps add an extension that would return the error log trail in
> the response dict similar to the return_call_reporting?

I support this idea. More detailed errors the better :-)

-- 
Petr Spacek  @  Red Hat