<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="DejaVu Sans">Hi Willem, Sara,<br>
<br>
To improve (in our view) getdns with respect to the failover/retry
behaviour towards UDP upstreams, we've made 1 fix and 2
enhancements:<br>
<br>
1) restrict the back_off value of an upstream to a configurable
maximum. This avoids that the back_off value (doubled at each
timeout for an upstream) keeps growing until the value rolls over.
We didn't want the interval for retrying an upstream to grow to
values like 2^16 or bigger when that upstream had an outage. Note
that the retry interval still is in 'query attempts', perhaps we want
to make that time-based at some point.<br>
<br>
2) when an upstream has been unavailable and is found to be Ok at
some point, its back_off value is not reset. So on a subsequent
timeout the back_off continues with the value from the previous
failure. We consider this a bug.<br>
<br>
3) when all configured upstreams of a context are unavailable, in
our view it makes more sense to retry these in a round-robin
fashion instead of sticking to the back_off values (especially
when one becomes unavailable earlier than another). The original
backoff mechanism may lead that one unavailable upstream is tried
hundreds or thousands of times before another one is given a try,
while the latter may be available again. Switching to round-robin
when all are unavailable for a number of attempts will lead to
faster recovery.<br>
<br>
I have these changes available on top of the latest 'develop'
branch. Shall I create pull-requests for them?<br>
(Credits also go to my colleague Shikha Sharma)<br>
<br>
Cheers,<br>
Robert<br>
</font>
</body>
</html>