In engineering systems often is the case that considering together two different domains of choice (like the scheduling and the routing of packets in the Internet) allows to extend the capabilities and unlock performance potential. Yet, in many systems this is thoroughly avoided, and the reason could be the principle of “time-scale separation”.
In a nutshell, time-scale separation describes a situation where two variables change value at widely different time-scales. The principle of time-scale separation has been used in chemistry since 1903, and the seminal work of Michaelis , and the idea was that two processes that attain stationary behavior at different time-scales can be studied separately. The figure above is taken from . The same idea was used also in the ingenious algorithm of Zhang and Walrand  to achieve asymptotically optimal throughput performance in random access, where the process of throughput accumulation is taken to be faster than randomizing over schedules of different transmitters (hence, the scheduling randomization occurs on average throughputs instead of instantaneous ones).
But the time-scale separation is important in system optimization as well. For example, sometimes we would not want to jointly configure two optimization variables, but rather separate the two scales and optimize individually. To take an extreme example, consider building a football stadium, and also preparing for a musical event in the stadium. In the latter, the positioning of extra seats has to be decided. It is tempting to consider the joint optimization of both. Stadium designs that facilitate the positioning of extra seats could create a fantastic musical event. However, we know that the two designs happen at different time-scales. The stadium is designed once for the next 30 years, and the musical event is arranged every time just before happening. As such, we prefer to consider two different optimizations; the stadium is designed with general events in mind, and the musical event is organized with the stadium design as a constraint.
Internet technologies are no different. At low time-scale, a code decides how to transmit information bits to achieve the capacity of the medium (wireless, optical fibre, cable, etc.). The code is then fixed and taken for granted when deciding how to schedule packets from different users at an Internet switch, or jobs at a VM hypervisor. Further, all these are fixed and granted when deciding which way to route the packets through, and further routing is also fixed when deciding how much bandwidth to request via the TCP protocol. Even that is fixed and granted when optimizing which connection to use at the application layer. Would Internet be faster if we could optimize all these at once?
I have seen numerous research works aspiring to perform Internet cross-layer control. The argument for performance improvement is compelling most of the times, but the approaches are not adopted in practice. Internet remains a pretty-much layered architecture. Ask anyone who is too much into research, and they will tell you that cross-layer optimization is the obvious way. Ask anyone who is too much into practice, and they will tell you cross-layer will never happen. Who is right though?
Is it that cross-layer is complex? Not necessarily. A good example is SDN systems, where the main idea is to centralize control for a fairly big network of 500 switches into one place. This replaces the idea of distributed optimization in the classical Internet design. Is SDN complex and slow? To my surprise, SDN is faster than any distributed system. In fact, distributization might have its own appeal, but when it comes to performance it is certainly worse than centralized approaches. So setting aside signaling and stability, systems should work better with centralized (and cross-layer) optimization. What is it then that prevents the application of joint optimization in the Internet?
My personal opinion is that the main problem with big systems is the so-called ossification. Think of the Internet as an old skeleton, its parts taken the shape of bones, and no one can break them. Making a paradigm shift in the Internet is as complex as deciding one day to make all living homes made of wood. Sure wood has its own advantages, but how one would go and change all millions of houses out there? Would you be able to build skyscrapers? Would you be able to survive a fire? This is the complexity that engineers have to deal with when making new designs in the modern Internet. Proposing a new idea is funny, but showing that it would actually work at Internet scale IS NOT. In this respect, I think that the biggest revolution in the 50 years of Internet life is really SDN!
 L. Michaelis and M. Menten. Die kinetik der Invertinwirkung. Biochem. Z. 1913;49:333–69.[Google Scholar]
 T. L. Parsons and T. Rogers. Dimension reduction for stochastic dynamical systems forced onto a manifold by large drift: a constructive approach with examples from theoretical biology. arXiv:1510.07031
 L. Zhang and J. Walrand. A Distributed CSMA Algorithm for Throughput and Utility Maximization in Wireless Networks. 2010 IEEE/ACM Transactions on Networking 18(3):960 – 972.