Currently, if one section (e.g. s2) becomes slow or overloaded, all appservers in all sections pile up to wait for response for s2 even though 80-90% of requests have nothing to do with s2. This turns a local incident (s2 being unavailable) to a general "everything is now down" outage.
Circuit breakers are designed to exactly handle such scenarios. If mw just immediately fatal to any attempt of connecting to an overloaded section, it'll save the appservers from being exhausted. Based on the numbers I collected, if all replicas have more than 400 connections, it means they are overloaded.
Thanks to T314020: LoadMonitor connection weighting reimagined implementing this is actually quite easy now.