[getdns-api] UDP failover improvements
willem at nlnetlabs.nl
Wed Feb 28 11:20:53 UTC 2018
Op 28-02-18 om 11:56 schreef Robert Groenenberg:
> Hi Willem, Sara,
> To improve (in our view) getdns with respect to the failover/retry
> behaviour towards UDP upstreams, we've made 1 fix and 2 enhancements:
> 1) restrict the back_off value of an upstream to a configurable maximum.
> This avoids that the back_off value (doubled at each timeout for an
> upstream) keeps growing until the value rolls over. We didn't want the
> interval for retrying an upstream to grow to values like 2^16 or bigger
> when that upstream had an outage. Note that the retry interval still is
> in 'query attempts', perhaps we want to make that time-based at some point.
Yes, the quickfix would be limiting that number. For the longer term
time-based backoffs are probably the way to go. That would be more
consistent with how stateful transports are handled currently too.
> 2) when an upstream has been unavailable and is found to be Ok at some
> point, its back_off value is not reset. So on a subsequent timeout the
> back_off continues with the value from the previous failure. We consider
> this a bug.
> 3) when all configured upstreams of a context are unavailable, in our
> view it makes more sense to retry these in a round-robin fashion instead
> of sticking to the back_off values (especially when one becomes
> unavailable earlier than another). The original backoff mechanism may
> lead that one unavailable upstream is tried hundreds or thousands of
> times before another one is given a try, while the latter may be
> available again. Switching to round-robin when all are unavailable for a
> number of attempts will lead to faster recovery.
Yes that sounds good too.
> I have these changes available on top of the latest 'develop' branch.
> Shall I create pull-requests for them?
> (Credits also go to my colleague Shikha Sharma)
I am currently in the process of reorganizing upstream management, so
perhaps your changes will not remain as provided, but it will be a good
starting point re-evaluating stateless upstreams backoff handling
Thanks and cheers!
> spec mailing list
> spec at getdnsapi.net
More information about the Users