Internet-Draft virtual-orchestra-research-challenges July 2021
Makhijani & Dong Expires 13 January 2022 [Page]
Independent Submission
Intended Status:
K. Makhijani
L. Dong

Virtual Orchestra Usecase and Research Challenges


This document describes open research challenges for emerging media-oriented ensemble applications. One such driving scenario is the network delivery of virtual orchestra that imposes multi-disciplinary challenges. Specifically, of interest are the group communication patterns in the production, delivery and consumption as different dimensions relating to the communication networks.
This document brings forth current research and engineering challenges in immersive media ensembles. The network domain problems come down to the specification of coordination of the received content with dependency constraints. The challenges depict both real and quasi- realtime behavior. A number of endpoint actors get involved in delivering the ensemble aspect, the research challenges also describe the expectations from the end points.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 13 January 2022.

Table of Contents

1. Introduction and Scope

The multimedia segment has seen tremendous advancements in immersive multimedia technologies. One of the ongoing research question is how to deliver a complete immersion experience of the digital media. Such media is produced from an ensemble of different actors or multiple sources that must coordinate as they perform together in the real environment. It translates to generating a very high volume of generated data streams

This memo presents research and engineering challenges in multi-user digital ensemble that need to be addressed in order to achieve these goals, spanning from pure research and engineering/standards space. The network related challenges are generalized as coordinated communications and explained as group communications with explicit dependencies. The objective of this memo is to document the technical challenges and corresponding current approaches and to expose requirements that should be addressed by future research and standards work.

2. Terminology

3. Scenario Description

In an orchestra ensemble the multimedia streams of musicians, each in a different place in the world come together and perform live on the stage which may also be at a different location.

Performing an ensemble with multiple participants separated by large as well as varying distances (from less than a mile, to 1000 miles) is quite difficult for applications due to varying path and latency characteristics.

The network needs to support the coordination of directions from the conductor to all of the musicians and the audio/visuals from musicians to the stage. In particular, in a large-scale ensemble when many instruments are involved, in order to to preserve the integrity of performance, it may be necessary to allow for the dropping of sound and hologram streams of a musician that cannot arrive at the same time as the others and to provide mechanisms for subsequent fast synchronization.

Virtual orchestra is a complex multi-disciplinary use case and requires in-depth knowledge in every field to recreate the real orchestral experience.

             |  conductor  |
                  |         one to many
   |              |              |
   v              v              v
+-----+       +-------+      +-------+
| t0" |       |   t0  |      |  t0'  |
+-(A)-+       +--(B)--+      +--(C)--+
   |              |              |
   +-----------+  |  +-----------+
               |  |  |        many to one
               v  v  v
         | coordinator node|
              | Stage  |
Figure 1: Virtual Orchestra Delivery over Network

3.1. Multiple Streams and actors

A virtual orchestra is a coordination of multiple flows as shown in Figure 1. In the current network terminology this is equivalent to multicast group of a number of endpoints and requiring to meet cooperation between the endpoints on how to send and receive information. An application point of view sees this as a membership to publish/subscribe topic. In the above example, endpoint actors are the conductor, musicians and the stage. The characteristics of traffic are predictable and the following steps take place

3.1.1. Conductor to Musicians

  • The conductor is initiator of the orchestral stream. Synchronized reception of the gestures of a conductor are critical to the performance.
  • Musicians perform on cues or gestures received over the network. It is necessary that all the musicians receive those cues to start the performance.
  • The performance follows the tempo and beats from the conductor, which must be delivered in a consistent (jitter free) manner (incurring no jitter).

3.1.2. Musicians to Stage

Atleast one output stream per sources will be generated to create the ensemble performance, these sources may have variable latencies. They should be aggregated to be delivered to the stage as a unified stream.

The two scenarios are one to many Section 3.1.1 and many to oneSection 3.1.2 type of group communication. The coordination constraints involve several dependencies such as of synchronization at the start of play, maintenance of same tempo along the time scale throughout the streaming part, description of distance for spatial sound quality.

3.2. Virtual Orchestra Scenario Challenges

In this section we draw forth scenarios with difficulties in delivering virtual orchestra over the network.

Note that virtual orchestra application itself maybe delivered in different ways. Non-realtime scenarios are not relevant since, in that case it is a non-interactive content delivery, the content does not require aggregation from multiple sources. An application and corresponding network can use buffering, low latency techniques and existing transport protocols to meet the expectations of an end-user.

Specific to real-time streaming of virtual orchestra, the performance is pseudo-real-time. It means that the synchronization of content originating from different sources is only as fast as its slowest path. In other words, one source-destination path of the co-flow will cause the pace of the group stream to slow-down, even though the other, shorter latency paths may deliver content sooner. This in a major co-dependency challenge, since the slowest path should not have any impact on the tempo and the beats. Thus 3 dependency considerations for the network are: - Feasibility Dependency: Assess and determine that with the slowest flow-member of the group if such a flow is even feasible. - Membership Dependency (spatial): The mechanisms to establish and determine membership and establish relationship is needed. Corelating to publisher (conductor and stage) and subscriber (performers) group communication model, not all subscribers need to know about each other. - Start Time Dependency (temporal): Each performer depends on the trigger to start from there on time-scale, tempo and beats of the performance must be preserved.

From a logical architectural point of view, coordination node is a function that synchronizes all the incoming streams, it may then either deliver all the streams or as a single stream.

4. Generlized Coordinated Service Concept

There are several examples of multi-party immersive applications (TBD - add section) in which remote entities will be required to recreate the behavior of being present in the same scene or environment. Therefore, they are co-dependent on each other's spatio-temporal behavior changes. For example, in an orchestra tempo or beats and gestures must remain the same for all performers and position of a musician is computed to create spatial audio.

A generalized in-network capability is introduced that consumes group communication membership and constraints and delivers service with in the specified constriaints.

Keeping in the network context, important terms and components of coordinated service are introduced as below:

                              .---.           +-----+
                             (     )----------|Co-EP|
                member        `---'           +-----+
 +-----+      |  flow         Co-SN
 |Co-EP|----+ |                 ^^
 +-----+    | |                .||.
            | |               ( ||)        .-.
            | v                `||'   ----(--)---->
          .---.  -------->    .---.  -----`-'----->    +-----+
         (     )-------------(     )-------------------|Co-EP|
          `---'   ------->    `---'                    +-----+
+-----+     |                 Co-SN
+-----+ ------>   Co-EP: coordinated service end point
                  Co-SN: coordinated service node
                  Co-Flow: coordinated flow
                  Member flow: member of a co-flow

Coordinated services are a form of group comminication with a clearly expressed dependencies. Possible approaches will figure out mechanisms to manage those dependencies.

5. Virtual Orchestra Coordination Challenges

The internet is a spatial-temporal heterogeneous environment, yielding different content delivery behaviours in time and space. No two paths (or even different flows on the same path) can be assumed to have identical properties in terms of latency, jitter, and bandwidth.

Currently, any effort to support virtual orchestra in the networks is not feasible. Managing flow dependencis entirely by the applications on endpoints does not always guarantee absolute time constraints due to unpredictable changes in network conditions. This necessitates some kind of coordination with the network.

5.1. Out-of-band Coordination

The out-of-band coordination may be used to achieve distribution of coflows in the network. The membership of co-dependent flows is conveyed from the end-points potentially when the flows are set up, so that the coarse-grained service (and service level objectives) can be enabled in the network. A distribution graph of coflows and associated dependency constraints may be constructed, and those nodes enhance their scheduling and forwarding by factoring in the timing information in the meta-data of packets.

5.2. In-band Coordination

Then timestamping of transmission from sender delivery to receiver may be conveyed as meta-data in packets transmitted from the senders. This type of In-band signaling conveys intermediate coordination points about the dependencies and interrelationship. To formalize these mechanisms to carry them in data path.

5.3. In-node Coordinated-forwarding

Actual coordination effort is done on the coordination points. The scheduling and forwarding engine should allow packets within sync markers to be sent as per remaining δt. It needs to compare the remaining coordination time and accordingly schedule or pace the packet forwarding.

6. Existing Research Work

7. IANA Considerations

This document requires no actions from IANA.

8. Security Considerations

This document introduces no new security issues.

Authors' Addresses

Kiran Makhijani
Lijun Dong
Central Expy
Santa Clara, CA 95050,
United States of America