Transports for WebRTCGoogleharald@alvestrand.noThis document describes the data transport protocols used by Web
Real-Time Communication (WebRTC),
including the protocols used for interaction with intermediate boxes
such as firewalls, relays, and NAT boxes.IntroductionWebRTC is a protocol suite aimed at real-time multimedia exchange
between browsers, and between browsers and other entities.WebRTC is described in the WebRTC overview document , which also defines terminology used
in this document, including the terms "WebRTC endpoint" and "WebRTC
browser".Terminology for RTP sources is taken from .This document focuses on the data transport protocols that are used
by conforming implementations, including the protocols used for
interaction with intermediate boxes such as firewalls, relays, and NAT
boxes.This protocol suite is intended to satisfy the security considerations
described in the WebRTC security documents, and .This document describes requirements that apply to all WebRTC
endpoints. When there are requirements that apply only to WebRTC
browsers, this is called out explicitly.Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Transport and Middlebox SpecificationSystem-Provided InterfacesThe protocol specifications used here assume that the following
protocols are available to the implementations of the WebRTC
protocols:
UDP :
This is the protocol assumed by
most protocol elements described.
TCP :
This is used for HTTP/WebSockets,
as well as TURN/TLS and
ICE-TCP.
For both protocols, IPv4 and IPv6 support is assumed.For UDP, this specification assumes the ability to set the
Differentiated Services Code Point (DSCP) of the sockets opened on a per-packet basis, in order to
achieve the prioritizations described in (see of this document) when
multiple media types are multiplexed. It does not assume that the DSCPs
will be honored and does assume that they may be zeroed or
changed, since this is a local configuration issue.Platforms that do not give access to these interfaces will not be
able to support a conforming WebRTC endpoint.This specification does not assume that the implementation will
have access to ICMP or raw IP.The following protocols may be used, but they can be implemented by a
WebRTC endpoint and are therefore not defined as "system-provided
interfaces":
TURN:
Traversal Using Relays Around NAT
STUN:
Session Traversal Utilities for NAT
ICE:
Interactive Connectivity Establishment
TLS:
Transport Layer Security
DTLS:
Datagram Transport Layer Security
Ability to Use IPv4 and IPv6Web applications running in a WebRTC browser MUST be able to
utilize both IPv4 and IPv6 where available -- that is, when two peers
have only IPv4 connectivity to each other, or they have only IPv6
connectivity to each other, applications running in the WebRTC browser
MUST be able to communicate.When TURN is used, and the TURN server has IPv4 or IPv6
connectivity to the peer or the peer's TURN server, candidates of the
appropriate types MUST be supported. The "Happy Eyeballs"
specification for ICE SHOULD be
supported.Usage of Temporary IPv6 AddressesThe IPv6 default address selection specification specifies that temporary addresses
are to be preferred over
permanent addresses. This
is a change from the rules specified by . For
applications that select a single address, this is usually done by the
IPV6_PREFER_SRC_TMP preference flag specified in . However, this rule, which is intended to ensure
that privacy-enhanced addresses are used in preference to static
addresses, doesn't have the right effect in ICE, where all addresses
are gathered and therefore revealed to the application. Therefore, the
following rule is applied instead:When a WebRTC endpoint gathers all IPv6 addresses on its host, and
both nondeprecated temporary addresses and permanent addresses of the
same scope are present, the WebRTC endpoint SHOULD discard the
permanent addresses before exposing addresses to the application or
using them in ICE. This is consistent with the default policy
described in .If some, but not all, of the temporary IPv6 addresses are marked
deprecated, the WebRTC endpoint SHOULD discard the deprecated
addresses, unless they are used by an ongoing connection. In an ICE
restart, deprecated addresses that are currently in use MAY be
retained.Middlebox-Related FunctionsThe primary mechanism for dealing with middleboxes is ICE, which is an
appropriate way to deal with NAT boxes and firewalls that accept
traffic from the inside, but only from the outside if it is in
response to inside traffic (simple stateful firewalls).ICE MUST be supported. The
implementation MUST be a full ICE implementation, not ICE-Lite. A full
ICE implementation allows interworking with both ICE and ICE-Lite
implementations when they are deployed appropriately.In order to deal with situations where both parties are behind NATs
of the type that perform endpoint-dependent mapping (as defined in
), TURN MUST be supported.WebRTC browsers MUST support configuration of STUN and TURN
servers, from both browser configuration and an application.Note that other work exists around STUN and TURN server discovery
and management, including for server discovery,
as well as .In order to deal with firewalls that block all UDP traffic, the
mode of TURN that uses TCP between the WebRTC endpoint and the TURN
server MUST be supported, and the mode of TURN that uses TLS over TCP
between the WebRTC endpoint and the TURN server MUST be supported. See
, for details.In order to deal with situations where one party is on an IPv4
network and the other party is on an IPv6 network, TURN extensions for
IPv6 MUST be supported.TURN TCP candidates, where the connection from the WebRTC
endpoint's TURN server to the peer is a TCP connection, MAY be supported.However, such candidates are not seen as providing any significant
benefit, for the following reasons.First, use of TURN TCP candidates would only be relevant in cases
where both peers are required to use TCP to establish a
connection.Second, that use case is supported in a different way by both sides
establishing UDP relay candidates using TURN over TCP to connect to
their respective relay servers.Third, using TCP between the WebRTC endpoint's TURN server and the
peer may result in more performance problems than using UDP, e.g., due
to head of line blocking.ICE-TCP candidates MUST be supported; this
may allow applications to communicate to peers with public IP
addresses across UDP-blocking firewalls without using a TURN
server.If TCP connections are used, RTP framing according to MUST be used for all packets. This includes the RTP
packets, DTLS packets used to carry data channels, and STUN
connectivity check packets.The ALTERNATE-SERVER mechanism specified in (300 Try Alternate) MUST be
supported.The WebRTC endpoint MAY support accessing the Internet through an
HTTP proxy. If it does so, it MUST include the "ALPN" header as
specified in , and proxy authentication as
described in and MUST also be supported.Transport Protocols ImplementedFor transport of media, secure RTP is used. The details of the
RTP profile used are described in "Media Transport and Use of RTP in WebRTC" , which mandates the use of a
circuit breaker
and congestion control (see for further guidance).Key exchange MUST be done using DTLS-SRTP, as described in .For data transport over the WebRTC data channel , WebRTC endpoints MUST support
SCTP over DTLS over ICE. This encapsulation is specified in . Negotiation of this
transport in the Session Description Protocol (SDP) is defined in . The SCTP extension for I-DATA
MUST be supported.The setup protocol for WebRTC data channels described in MUST be supported.WebRTC endpoints MUST support multiplexing of DTLS and RTP over the
same port pair, as described in the DTLS-SRTP specification , with clarifications in . All application-layer
protocol payloads over this DTLS connection are SCTP packets.Protocol identification MUST be supplied as part of the DTLS
handshake, as specified in .Media PrioritizationIn the WebRTC prioritization model, the application tells the
WebRTC endpoint about the priority of media and data that is controlled
from the API.In this context, a "flow" is used for the units that are given a
specific priority through the WebRTC API.For media, a "media flow", which can be an "audio flow" or a "video
flow", is what calls a "media source", which
results in a "source RTP stream" and one or more "redundancy RTP
streams". This specification does not describe prioritization between
the RTP streams that come from a single media source.All media flows in WebRTC are assumed to be interactive, as defined
in ; there is no browser API support for
indicating whether media is interactive or noninteractive.A "data flow" is the outgoing data on a single WebRTC data
channel.The priority associated with a media flow or data flow is classified
as "very-low", "low", "medium", or "high". There are only four priority
levels in the API.The priority settings affect two pieces of behavior: packet send
sequence decisions and packet markings. Each is described in its own
section below.Local PrioritizationLocal prioritization is applied at the local node, before the
packet is sent. This means that the prioritization has full access to
the data about the individual packets and can choose differing
treatment based on the stream a packet belongs to.When a WebRTC endpoint has packets to send on multiple streams
that are congestion controlled under the same congestion control
regime, the WebRTC endpoint SHOULD cause data to be emitted in such a
way that each stream at each level of priority is being given
approximately twice the transmission capacity (measured in payload
bytes) of the level below.Thus, when congestion occurs, a high-priority flow will have the
ability to send 8 times as much data as a very-low-priority flow if
both have data to send. This prioritization is independent of the
media type. The details of which packet to send first are
implementation defined.For example, if there is a high-priority audio flow sending
100-byte packets and a low-priority video flow sending 1000-byte
packets, and outgoing capacity exists for sending > 5000 payload bytes, it
would be appropriate to send 4000 bytes (40 packets) of audio and 1000
bytes (one packet) of video as the result of a single pass of sending
decisions.Conversely, if the audio flow is marked low priority and the video
flow is marked high priority, the scheduler may decide to send 2 video
packets (2000 bytes) and 5 audio packets (500 bytes) when outgoing
capacity exists for sending > 2500 payload bytes.If there are two high-priority audio flows, each will be able to
send 4000 bytes in the same period where a low-priority video flow is
able to send 1000 bytes.Two example implementation strategies are:
When the available bandwidth is known from the congestion
control algorithm, configure each codec and each data channel with
a target send rate that is appropriate to its share of the
available bandwidth.
When congestion control indicates that a specified number of
packets can be sent, send packets that are available to send using
a weighted round-robin scheme across the connections.
Any combination of these, or other schemes that have the same
effect, is valid, as long as the distribution of transmission capacity
is approximately correct.For media, it is usually inappropriate to use deep queues for
sending; it is more useful to, for instance, skip intermediate frames
that have no dependencies on them in order to achieve a lower bitrate.
For reliable data, queues are useful.Note that this specification doesn't dictate when disparate streams
are to be "congestion controlled under the same congestion control
regime". The issue of coupling congestion controllers is explored
further in .Usage of Quality of Service -- DSCP and MultiplexingWhen the packet is sent, the network will make decisions about
queueing and/or discarding the packet that can affect the quality of
the communication. The sender can attempt to set the DSCP field of the
packet to influence these decisions.Implementations SHOULD attempt to set QoS on the packets sent,
according to the guidelines in . It is appropriate to depart from
this recommendation when running on platforms where QoS marking is not
implemented.The implementation MAY turn off use of DSCP markings if it detects
symptoms of unexpected behavior such as priority inversion or blocking
of packets with certain DSCP markings. Some examples of such behaviors
are described in . The detection of these
conditions is implementation dependent.A particularly hard problem is when one media transport uses
multiple DSCPs, where one may be blocked and another may be
allowed. This is allowed even within a single media flow for video in
. Implementations need to
diagnose this scenario; one possible implementation is to send initial
ICE probes with DSCP 0, and send ICE probes on all the DSCPs
that are intended to be used once a candidate pair has been
selected. If one or more of the DSCP-marked probes fail, the sender
will switch the media type to using DSCP 0. This can be carried out
simultaneously with the initial media traffic; on failure, the initial
data may need to be resent. This switch will, of course, invalidate any
congestion information gathered up to that point.Failures can also start happening during the lifetime of the call;
this case is expected to be rarer and can be handled by the normal
mechanisms for transport failure, which may involve an ICE
restart.Note that when a DSCP causes nondelivery, one has to
switch the whole media flow to DSCP 0, since all traffic for a single
media flow needs to be on the same queue for congestion control
purposes. Other flows on the same transport, using different DSCPs, don't need to change.All packets carrying data from the SCTP association supporting the
data channels MUST use a single DSCP. The code point used
SHOULD be that recommended by for the highest-priority data
channel carried. Note that this means that all data packets, no matter
what their relative priority is, will be treated the same by the
network.All packets on one TCP connection, no matter what it carries, MUST
use a single DSCP.More advice on the use of DSCPs with RTP, as well as the
relationship between DSCP and congestion control, is given in .There exist a number of schemes for achieving quality of service
that do not depend solely on DSCPs. Some of these schemes
depend on classifying the traffic into flows based on 5-tuple (source
address, source port, protocol, destination address, destination port)
or 6-tuple (5-tuple + DSCP). Under differing conditions, it
may therefore make sense for a sending application to choose any of
the following configurations:
Each media stream carried on its own 5-tuple
Media streams grouped by media type into 5-tuples (such as
carrying all audio on one 5-tuple)
All media sent over a single 5-tuple, with or without
differentiation into 6-tuples based on DSCPs
In each of the configurations mentioned, data channels may be
carried in their own 5-tuple or multiplexed together with one of the
media flows.More complex configurations, such as sending a high-priority video
stream on one 5-tuple and sending all other video streams multiplexed
together over another 5-tuple, can also be envisioned. More
information on mapping media flows to 5-tuples can be found in .A sending implementation MUST be able to support the following
configurations:
Multiplex all media and data on a single 5-tuple (fully
bundled)
Send each media stream on its own 5-tuple and data on its own
5-tuple (fully unbundled)
The sending implementation MAY choose to support other
configurations, such as
bundling each media type (audio, video, or data) into its own 5-tuple
(bundling by media type).Sending data channel data over multiple 5-tuples is not
supported.A receiving implementation MUST be able to receive media and data
in all these configurations.IANA ConsiderationsThis document has no IANA actions.Security ConsiderationsWebRTC security considerations are enumerated in .Security considerations pertaining to the use of DSCP are enumerated
in .ReferencesNormative ReferencesOverview: Real-Time Protocols for Browser-Based ApplicationsCongestion Control Requirements for Interactive Real-Time MediaSecurity Considerations for WebRTCMedia Transport and Use of RTP in WebRTCWebRTC Data ChannelsWebRTC Data Channel Establishment ProtocolWebRTC Security ArchitectureDifferentiated Services Code Point (DSCP) Packet Markings for
WebRTC QoSSession Description Protocol (SDP) Offer/Answer Procedures for
Stream Control Transmission Protocol (SCTP) over Datagram Transport Layer
Security (DTLS) TransportApplication-Layer Protocol Negotiation (ALPN) for WebRTCSession Description Protocol (SDP) Offer/Answer Considerations for
Datagram Transport Layer Security (DTLS) and Transport Layer Security (TLS)Informative ReferencesHow to say that you're special: Can we use bits in the IPv4
header?ANRW '16: Proceedings of the 2016 Applied Networking
Research Workshop, pages 68-70AcknowledgementsThis document is based on earlier draft versions embedded in , which were the result of contributions from many RTCWEB Working Group
members.Special thanks for reviews of earlier draft versions of this document go to
, , , and ; the
contributions from also deserve special mention.