Definition
Queueing theory is the branch of probability dealing with systems in which entities arrive, wait for service, are served, and depart. A queueing model specifies the arrival process (often Poisson), the service-time distribution (often exponential), the number of servers, the queue discipline (first-in first-out, priority, etc.), and the system capacity.
The field began in 1909 with A. K. Erlang's analysis of telephone exchanges and now underlies the design of call centres, computer networks, traffic systems, hospital operations, and cloud-computing infrastructure.
Why it matters
How it works
The simplest queue, M/M/1, has Poisson arrivals at rate lambda and exponential service times with rate mu, with utilisation rho = lambda / mu. The average number in the system is rho / (1 - rho), and the average wait time is 1 / (mu - lambda). As rho approaches 1, both blow up — the famous non-linearity that defeats intuition.
Real systems extend this skeleton with multiple servers (M/M/c), finite capacity, priority disciplines, general service-time distributions (M/G/1), or networks of queues. Each layer of generality preserves the same insight: arrival variability and service-time variability together determine waiting time.