<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font face="DejaVu Sans">Hi Willem, Sara,<br>

      <br>

      To improve (in our view) getdns with respect to the failover/retry

      behaviour towards UDP upstreams, we've made 1 fix and 2

      enhancements:<br>

      <br>

      1) restrict the back_off value of an upstream to a configurable

      maximum. This avoids that the back_off value (doubled at each

      timeout for an upstream) keeps growing until the value rolls over.

      We didn't want the interval for retrying an upstream to grow to

      values like 2^16 or bigger when that upstream had an outage. Note

      that the retry interval still is in 'query attempts', perhaps we want

      to make that time-based at some point.<br>

      <br>

      2) when an upstream has been unavailable and is found to be Ok at

      some point, its back_off value is not reset. So on a subsequent

      timeout the back_off continues with the value from the previous

      failure. We consider this a bug.<br>

      <br>

      3) when all configured upstreams of a context are unavailable, in

      our view it makes more sense to retry these in a round-robin

      fashion instead of sticking to the back_off values (especially

      when one becomes unavailable earlier than another). The original

      backoff mechanism may lead that one unavailable upstream is tried

      hundreds or thousands of times before another one is given a try,

      while the latter may be available again. Switching to round-robin

      when all are unavailable for a number of attempts will lead to

      faster recovery.<br>

      <br>

      I have these changes available on top of the latest 'develop'

      branch. Shall I create pull-requests for them?<br>

      (Credits also go to my colleague Shikha Sharma)<br>

      <br>

      Cheers,<br>

      Robert<br>

    </font>

  </body>

</html>