BITNET Network Working Group P. Olenick Request for Comments: 0002 Princeton University April 1989 Definition of the BITNETII Protocol A Technical Overview of VMNET Status of this Memo This RFC specifies the protocol used for the transmission of the NJE protocols over TCP (Transmission Control Protocol). Applications wishing to participate with BITNETII are expected to adopt and implement this specification. Distribution of this memo is unlimited. I. Introduction This document describes the technical aspects of the implementation of the IBM Remote Spooling Communications Subsystem (RSCS) over TCP/IP protocol. This document will provide an overview of the internal structure of the VM service machine known as VMNET. The external data formats and protocol conventions used by the service machine are detailed. The goal of this document is to provide sufficient information to allow developers to build systems which can communicate with the VMNET service machine. The reader of this document should have a working knowledge of RSCS, the IBM implementation for VM of TCP/IP (5798-FAL), and some experience with IBM's VM system. The reader should also have access to the IBM manual 'Network Job Entry Formats and Protocols for System/370 Program Products', GG22-9373-02. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 2 II. Functional Overview VMNET is the name applied to the collection of programs which execute in a VM service machine and provide the encapsulation of RSCS virtual channel-to-channel (CTC) data into TCP buffers. The data blocks constructed by VMNET are passed to the TCP/IP VM service machine via standard interface calls. The TCP/IP service machine will transport the data via the IP network to the recipient VMNET. The recipient VMNET will receive the data buffers from the TCP/IP service machine on that system. The recipient VMNET will write the data to the virtual channel-to-channel which connects the VMNET service machine with the RSCS service machine. In this way, RSCS can make use of a TCP/IP network. TCP/IP networks can be constructed using a variety of transport media, including Ethernet and serial data lines. The configuration of the IP network and its associated hardware is known (in part) to the TCP/IP service machine. VMNET uses calls to the TCP/IP service machine and thus is media independent. VMNET is a class G, disconnected service machine, running under CMS. No modifications are required to VM, CMS, RSCS, or the TCP/IP service machine to run VMNET. VMNET interfaces to RSCS's NJE line driver via virtual channel-to-channel adapters, one for each VMNET-connected link. VMNET will operate with RSCS Version 1 and RSCS Version 2. VMNET will operate in the currently supported VM/CMS systems (including VM/XA) and can be used with all versions of the IBM TCP/IP system (5798-FAL). The VMNET machine consists of a multi-tasking CMS supervisor and an application running under that supervisor. The multi-tasking supervisor is based on the public domain code authored by Cheyenne Wills. Extensions and modifications were made to this supervisor to provide specific functions required for the VMNET project. The multi-tasker, known as IUC, allows multiple 'tasks' to run in the same CMS machine concurrently. IUC is a cooperative multi-tasker which depends on each task giving control to the supervisor, which in turn passes control to another task which is eligible to execute. The method used to relinquish control is the CMS WAITECB routine. When IUC is loaded, a CMS nucleus extension is installed which receives control when a CMS WAITECB is executed. One VMNET service machine needs to support a number of concurrent RSCS connections. Having one service machine per connection would have caused a number of problems. The function of VMNET is to transport data via CTC I/O operations and TCP interface calls. A large amount of wait time can be expected on a given link. The application, known as DPU, provides the function of taking data from the RSCS CTC and building data blocks which are passed to the TCP/IP Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 3 service machine. DPU must establish a virtual circuit (TCP connection) between the two VMNET systems which are going to transport RSCS data via the IP network. DPU connects to the local RSCS via a CTC. The local RSCS assumes it is communicating directly with another RSCS via the CTC. The 'link driver' component of DPU takes the data read from the CTC and builds data blocks which are passed to the TCP/IP service machine via user callable interface routines. DPU will receive data blocks from the TCP/IP service machine and the link driver will write this data to the CTC. RSCS will process this data as though it had come from another RSCS. The encapsulation and transport of the data is transparent to the RSCS machines. Each RSCS simply appears to be connected to the other via the CTC. The way in which VMNET accomplishes the function of moving the data in a transparent manner is what the remainder of this document will address. III. RSCS Protocol Review RSCS uses several different layers of protocol to transport data. The highest layer is the NJE protocol. The use of the RSCS NJE line driver was dictated by the fact that the NJE protocol is the only supported way of connecting an RSCS V1 machine to an RSCS V2 (or RSCS V2 to RSCS V2). The NJE protocol is used to describe the data. The NJE protocols were developed by IBM to allow data to be transported between different operating systems. VMNET has no knowledge of the NJE protocols; it transports the NJE protocol as if it were data. The next layer of protocol is the logical file protocol of RSCS. This protocol is used to insure the integrity of the file being sent from one system to another. RSCS uses the term 'stream' to mean the collection of NJE protocol records plus the data of a file to be transported from one RSCS to another. The protocol uses control records for such functions as stream open and stream complete. An example of how this protocol is used can be seen by looking at the steps needed to send a file from one RSCS to another. The sending RSCS will send a control record to the receiving RSCS that is 'request to open a stream', which is a request for permission to send data. The receiving RSCS will respond with a control record which is either 'permission to send granted' or 'request rejected'. If the response is 'permission to send granted' then the sending RSCS will start the transfer of data. When all data has been transferred by the sending RSCS, it will send an end-of-file (EOF) control record. Once the receiving RSCS has received the EOF and has completed its processing of the file, the receiving RSCS will send a 'stream complete' control record to the sending RSCS. Only when the sending RSCS has received the 'stream complete' will the sending RSCS purge the file from its queue. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 4 Part of this logical protocol is a count field in each control or data record transmitted. This field, referred to as the BCB, is a modulo 16 number and is used to sequence the records. This enables the receiving RSCS to determine if any records were lost in transmission. VMNET must monitor the logical protocol closely. VMNET will under some circumstances alter the BCB or sequence of control records, but VMNET uses the RSCS logical file protocol to insure file integrity. The next layer of protocol is the block acknowledgment. This is used to indicate if a block of data being transmitted was received correctly. This protocol is needed when blocks are being transmitted via a medium which might be susceptible to data corruption, such as serial teleprocessing lines. The detection of the error is done by the hardware checking the CRC for the block. This same protocol is used when transmitting data via CTC. The sending RSCS writes a block of data via the CTC and expects to read a block of data or an ACK record from the receiving RSCS, via the CTC. If the sending RSCS receives a NAK or some other unexpected response, the last block would be assumed to be in error and error recovery would be needed. It should be noted that RSCS considers all errors of this type to be fatal errors when the transport medium is a CTC. The receiving RSCS can send one of its queued data buffers as the positive response to the sending RSCS. In this way, data can be sent in each direction. A good data record read indicates the data written was received correctly by the other RSCS. The receiving RSCS would send an ACK (x'1070') if it has no data to send. RSCS V1 uses the ACK. RSCS V2 uses a null record in place of the ACK but will accept an ACK because of the need to allow RSCS V1 to connect to RSCS V2. VMNET makes use of this layer of protocol to force RSCS to send additional data. VMNET emulates the sequence of commands used to drive the CTC by RSCS. When RSCS sends data, VMNET will respond either with data it has received via TCP or an ACK. The data received by VMNET from RSCS is placed in a buffer which holds the VMNET data block which will be sent via TCP. The following is a diagram of the logical and physical protocol flow in a standard RSCS to RSCS connection. This chart shows the flow of sending a file from one RSCS to another. RSCS sending RSCS receiving ------------------------------------------------ request to send---> <---ACK ACK---> <---ACK ACK---> <---permission to send Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 5 data---> <---ACK data---> <---ACK EOF---> <---file complete Later in this document, once the flow of VMNET has been explained, a similar diagram will show how VMNET alters this flow. IV. Definition of the BITNET II Protocol A. Establishing the TCP Connection Before data can be sent from one RSCS to another via TCP (using VMNET), a virtual circuit must be established between the two VMNETs. A virtual circuit is a path between two applications over which TCP packets may be sent. An IP address is assigned to a system. Each VM TCP/IP service machine will have an IP address. Many applications on a system may use the TCP/IP service machine. To enable the TCP/IP service machine to separate incoming packets, the applications use port numbers to indicate which packets correspond to which applications. TCP allows an application to 'open' a virtual circuit in either passive mode (waiting for incoming requests to open, also known as 'server mode') or active mode (sending requests to open, also known as 'client mode'). In general, one TCP application (the server) will issue a passive open for a port number (known as the 'well known port') and the other TCP application (the client) will issue an active open for the well known port on the system (IP address) where the first (server) application is located. The TCP connection or virtual circuit path between the two applications will be completed and data may be exchanged over the path. VMNET has a passive open outstanding to receive incoming requests. The well known port number used is 175. When VMNET on machine A wishes to establish a virtual circuit with the VMNET on machine B, an active open is issued by the VMNET on machine A for the well known VMNET port on machine B. The IP address of machine B is known to machine A by using information supplied in the VMNET configuration file (VMNET DIRECT). Once the open is completed, the VMNET which issued the active open (assume machine A) must do a TCP send of a VMNET control record, type OPEN. All VMNET TCP sends set the TCP PUSH flag. The TCP PUSH flag is set to force the data to be sent by the TCP/IP service machine. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 6 The VMNET which completed the passive open will only accept a VMNET control record type OPEN. Any other record being received on this virtual circuit by the VMNET port handler, will be considered an error and will cause the TCP connection to be closed by being aborted. The format of the VMNET control record can be found in the data format section of this document. The VMNET at machine A must set fields in the VMNET control record before the record can be sent to the port handler at machine B. The VMNET at machine A must set the type field in the control record to OPEN. RHost is the node name used to identify the RSCS node name on machine A which is associated with this link. OHost is the node name used to identify the RSCS node name on machine B. HIP and OIP are the hexadecimal form of the IP addresses for machine A and machine B. The VMNET on machine B will use OHost to verify that the connection has been made to the correct site and will use RHost to search for a definition of a link by that name. Several checks are made by the VMNET on machine B, such as: is the link attempting an active open, is a link by that name defined, and is the link currently connected. If the state of the link on machine B will permit a connection, a VMNET control record is sent to the VMNET on machine A. This control record type is 'ACK' with RHost and RIP set to machine B and OHost and OIP set to machine A. Once the ACK control record is sent, the well known port handler will pass the information about the open virtual circuit to the link driver. The well known port handler will issue another TCP passive open for the well known port number, which will enable VMNET to accept a request to open another connection. The module DPUWPORT is the port handler for VMNET and will issue the passive open and receive the VMNET type OPEN control record. DPUWPORT will process the open request and will respond with either an ACK or NAK VMNET control record. In general, each VMNET service machine has only one port handler task running. The processing of open requests by the port handler is serial. The port handler either accepts the open request and pass the information about the open TCP connection to the associated link drive task or the port handler returns a VMNET control record type NAK and closes the TCP connection. In either case, the port handler will issue another passive open. The module DPUWACOP is called by the link driver, DPUWLNK2, to perform the active open and process the control record returned by the port handler. One link driver task is created for each link activated. Each link driver task may call DPUWACOP to do an active TCP open. A VMNET service may have a number of TCP active opens in progress at any one time. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 7 If the state of the link on machine B will not permit the connection, a control record with type set to 'NAK' is sent to machine A. If a NAK control record is sent, the TCP virtual circuit is closed by the port handler after the record is sent. After the close is complete, the port handler will issue another passive open. A NAK control record has a 'reason code' field, which indicates the reason the open request was rejected. Reason codes currently used are X'01', X'02', X'03'. Code X'01' indicates that the port handler could not locate a link defined with name supplied in RHost. A code of X'01' may indicate a configuration problem if the error persists for more than a few open attempts. A link may go into an undefined state when it is going through a restart cycle. Code X'02' indicates that the port handler found the requested link in the connected state. The port handler will force a restart of a link found in the connected state after returning a reason code X'02'. The assumption is that VMNET did not detect that the TCP connection had been broken. During a restart of the link, the VMNET attempting an active open may receive a NAK reason code X'01' while the restart is in progress. The reason a link will be unable to be located is that all link control blocks are released and rebuilt as part of the restart. Code X'03' indicates that the port handler found the requested link to be in the process of doing an active open. The VMNET receiving a reason code X'03' selects a random (short) period of time. VMNET currently uses a value between between 1 and 10 seconds to wait for the short wait period. VMNET uses the low-order bits of the TOD clock as a random number generator. After the wait period has expired, a check is made to see if, while in this waiting state, a passive open has been completed with this link. If no passive open has completed, another attempt at an active open is tried. The VMNET issuing the active open will count the number of successive active open attempts which fail. If a preset limit (currently set at 10) is reached, a long wait value (currently set at 1 minute) is used in place of the random wait. This prevents needless attempts to open a TCP connection when the passive end is not reachable, for example, when the network or host system is down. Another technique used by VMNET is to have the port handler increment a counter for each attempted open of a given link for which a reason code x'03' is returned. This same counter is reset to zero by each attempt of this link to do an active open. If for some reason an active open fails to complete within a reasonable number of attempts (the current VMNET value is 5), the port handler will force a restart of the link. VMNET, especially in the port handler, uses a number of 'deadman' timers to prevent tasks from becoming permanently hung. These timers are typically set to 2 minutes. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 8 Once the open has completed and the VMNET control records have been successfully exchanged and properly acknowledged, the RSCS systems may now exchange their signon sequences. Until the RSCS signon sequences (the exchange of type 'I' and 'J' signon records) are complete, the VMNET link driver task will be in 'lockstep' mode. In lockstep mode, one VMNET link driver will 'wait' for data to be TCP received from the other VMNET link driver. Once the TCP data is received, a CTC I/O will be done to RSCS and the data read from the CTC will be sent to the other VMNET, which is 'waiting' for data from a TCP receive. Lockstep mode is needed because RSCS has its own master/slave relationship and expects to exchange the signon sequence in a fixed way, allowing for no other data except the specific signon sequence. To guarantee that the signon sequence can be exchanged between RSCSs in the required format, lockstep mode is used. Once the signon records have been exchanged, lockstep mode is no longer used by the link driver task and full duplex operation can begin. RSCS believes that its connection is via CTC to another RSCS. In a true RSCS to RSCS via CTC, if both RSCSs attempted to write a signon record, one I/O operation would succeed and the other would receive a busy. The way VMNET deals with the problem of getting the RSCSs started is to have the VMNET which completed the active TCP open do CTC I/O first. The standard RSCS CTC I/O operation consists of a sense CCW, a write CCW, a control CCW, and a read CCW, chained together. The data written in this first operation is a single byte of x'00'. Once the CTC I/O operation is complete, the data read is sent via TCP to the other RSCS which is waiting for a TCP buffer to be received. The data received is written over the CTC to RSCS and data is read from the CTC. This data is sent via TCP to the first VMNET and so on. This lockstep way of processing is used until the signon sequence for both RSCSs is complete. VMNET examines each record written to or read from the CTC. For type 'I' signon records, which are exchanged during the startup phase of the RSCS connection, the PREPARE flag is reset. This will cause the RSCSs to use the old style handshaking protocol. The RSCS systems will send an ACK or null buffer and then wait 2 seconds when the link is idle. The PREPARE protocol used by RSCS V2 handles an idle link differently. When the PREPARE protocol is agreed to by both RSCSs, and the link goes idle, no I/O is left outstanding on the CTC. The first RSCS that wants to write will do I/O and the other RSCS will be notified by an attention interrupt on the CTC. PREPARE protocol is a much better way of handling the idle link than setting a timer, which is what RSCS V1 uses. To implement PREPARE protocol for VMNET will require some additional logic in the CTC handler used in VMNET. VMNET does not yet support PREPARE protocol but will do so in the near future. The lack of support for PREPARE protocol is the reason the flag in the signon record is reset. Turning off the PREPARE flag indicates to RSCS that PREPARE protocol is not supported by the other system. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 9 B. Summary of VMNET Connection Startup VMNET A VMNET B (active open) (passive open) Active open---> Passive open outstanding <---Open complete---> Send control-Type OPEN---> TCP receive outstanding Process open request TCP receive outstanding <---Send control-Type ACK <---Post link driver open complete---> Do CTC I/O to RSCS Wait for TCP receive TCP send data---> Do CTC I/O with Recv'd. Data Wait for TCP receive <---TCP send data <---Continue lockstep until signon complete---> C. RSCS to VMNET Data Flow VMNET uses the same CCWs as RSCS to move data via the CTC. The string of chained CCWs consists of the following commands chained together: sense, write, control, and read. RSCS uses the same string of commands and the result is that the sense and write command of the VMNET CCW string mates to the control and read of the RSCS CCW string. The read CCW of VMNET will read data from RSCS directly into a data block which will be sent via TCP. The write CCW of VMNET will write data directly from the data block received by VMNET. The reason for handling the data this way is to avoid having to move the data within VMNET. Fields in the record header have been left empty in case data compression is added to VMNET in the future. If some form of data compression was added to VMNET, the data might have to be moved as part of the decompression. VMNET uses three different buffer sizes. VMNET extracts from the RSCS type 'J' signon record the maximum size of the block RSCS will write over the CTC. This size will become the length used in the CTC read command. VMNET has a buffer which is used to build and receive data blocks. The TCP/IP service machine has a maximum size for a TCP send. For FAL, the size is 8K bytes, and 8K is the default size of the VMNET data block buffer. VMNET has parameters which allow the VMNET and TCP buffer sizes to be altered. VMNET will segment data to be sent if the VMNET buffer is greater than the TCP size. For example, if the VMNET data block buffer size is 10K and the TCP size is set at 1K, VMNET would construct a 10K data block of data read from RSCS and issue ten 1K TCP sends to transport the data to the receiving VMNET. The buffer size values apply on a link-by-link basis. Different links in the same VMNET may have different buffer sizes. The VMNET data block Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 10 size for a connection must be the same for each end of the connection. The reason for this restriction is that if one VMNET builds a 10K data block and TCP sends it to another VMNET (in segments), the receiving VMNET needs enough buffer space to re-assemble the entire 10K buffer. The other rule about buffer sizes is that the VMNET data block buffer size must be larger than the RSCS buffer size. VMNET will need to add a block header, record header and end-of-file to the data block before it is sent via TCP. RSCS sends a data file to another RSCS by opening a 'stream'. RSCS V1 can open one stream in each direction, thus one file can be sent and one received at the same time. RSCS commands and messages are sent in a separate (and somewhat special) stream, which does not require an 'open'. One RSCS (CTC) buffer can contain data for only one stream. RSCS V1 allows for data buffers and message buffers to be intermixed over the same connection. RSCS V2 allow for up to seven data streams to be open in each direction. Messages and commands are treated as a separate and special stream, which is not one of the seven. It should be noted that when RSCS V1 is connected to RSCS V2 only one data stream will be allowed in each direction. D. Building the VMNET Buffer to be Sent via TCP VMNET sets the length of its CTC read CCW to the length extracted from the signon (type J) record. The type J signon record contains the RSCS-negotiated RSCS buffer size. The RSCS buffer size is the maximum length that RSCS will use in a CTC write command. The address into which the data will be read is within the VMNET data block buffer. The VMNET data block starts with a block header (TTB). The block header contains a field which is the length of the data block including the length of the header. Each block read from the CTC will start with a record header (TTR), which is built by VMNET. The record header contains a field which is the length of the data read from the CTC. The length does not include the length of the header. The last record of a VMNET data block is a record header with data length of zero. This is the end-of-block marker. The length of the data block sent via TCP will be variable. When the CTC read is complete, the actual size of the data read is computed. The VMNET record header is built in the space reserved for the header in the data block. Header space is reserved by computing the next available data location in the data block and adding the length of the record header to the address used in the VMNET CTC read CCW. A calculation is performed to see if the space remaining in the VMNET data block can hold a maximum length RSCS CTC data block plus the VMNET end-of-block (EOB). If the next CTC read data will fit, the read CCW is updated to accept the next CTC read data. If insufficient space remains for the next RSCS Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 11 CTC read block, the VMNET EOB (space is reserved in the calculation to insure that the EOB can always be inserted into the VMNET data block) is created and the data block is queued to the TCP send routine. The TCP PUSH flag is always set on whenever VMNET issues a TCP send request. The integrity of the RSCS data is maintained, even when the BCB count is reset, by the TCP guarantee of reliable transmission of the VMNET data blocks. VMNET, unlike RSCS, can place different data streams in the same data block, including the command and message stream. If the data read from RSCS over the CTC is an ACK or null record and data has previously been read from RSCS and is buffered in the current VMNET data block, the data block is closed (an EOB is created) and is queued to the TCP send routine. The assumption is that if RSCS has sent an ACK or null record, it has no data to write. VMNET should send data it has buffered at this time. VMNET will not place ACKs or null records into the VMNET data block. These cause problems at the receiving RSCS because they may signal the receiver to go into the idle state, thereby causing long delays. Only 'real' data is placed in the VMNET data block. If VMNET reads an ACK or null buffer from RSCS over the CTC and the current TCP data block has no previously read CTC data, the ACK or null record is ignored and discarded. VMNET has to handle the sequence numbers (BCBs) contained within null records which are discarded. RSCS uses a modulo 16 count to insure that buffers are not lost or duplicated during transmission. VMNET will set the 'reset sequence number' flag (the flag is part of the BCB) to resync the BCB count values. Using the BCB reset flag will cause the receiving RSCS to accept the BCB count as a new starting count. In this way, VMNET can discard sequenced records which should not be transmitted and keep use of the BCB for the majority of RSCS-to-RSCS communications. In summary, the data blocks built by VMNET are variable length, beginning with a fixed length block header which contains the total length of the data block. Each CTC block read by VMNET from RSCS is read into the VMNET data block buffer. The VMNET module DPUWLNK2 creates a record header which precedes the RSCS CTC data. When the data block is full or when an RSCS idle condition is detected, an EOB is created following the last data record. The VMNET data block is queued to the TCP send routine for transmission. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 12 E. Processing Data Received by VMNET from TCP TCP treats the data being sent or received as a continuous stream of bytes. RSCS reads and writes data as blocks. VMNET may combine a number of RSCS CTC blocks into one data block to be sent via TCP. VMNET uses the block header length to receive the data block as built by the sending VMNET. VMNET uses the record header count to build the CTC write CCW, which allows RSCS to CTC read the same record as written by the sending RSCS. To insure the complete block is received from TCP, VMNET issues a TCP receive for the block header, which is of fixed length. Once the block header is received, the length of the remaining data can be computed. A TCP receive is then issued for the length of the data block minus the length of the block header. Care must be taken to insure all the data sent has been received. TCP may deliver segments of the data block which are smaller than the count given to TCP receive. This may be a function of the TCP implementation used on the IBM systems. If the length of the segment received is less than the length requested, a new receive length is computed, based on the total length of the data block from the header minus the size of the header and any segments received. Another TCP receive is issued to complete the data buffer. VMNET is prepared to receive as many segments as is required to receive the complete data block, based on the header count. Once the complete data block has been received, it can be deblocked into the individual records and written to RSCS over the CTC. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 13 F. Summary of RSCS to VMNET Data Flow The following chart shows the flow of data between one RSCS sending a file, VMNET, and the TCP/IP service machine and the corresponding TCP/IP, VMNET, and RSCS on the receiving system. RSCS VMNET TCP/IP TCP/IP VMNET RSCS (send) (recv) |-CTC-| |-VMCF-| |-IP-| |-VMCF-| |-CTC-| via Network ------------------------------------------------------------ Request to open-> <-ACK ACK-> TCP send--> IP send---> IP recv <-ACK TCP receive ACK-> write-> Request to open <-ACK read <-Permission ACK-> granted <--TCP send <---IP send IP recv TCP recv Perm. <-write granted data-> <-ACK data-> <-ACK data-> <-ACK TCP send--> IP send---> IP recv data-> TCP receive <-ACK write-> data data(EOF)-> read ACK <-ACK write-> data ACK-> read ACK <-ACK write-> data Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 14 TCP send--> read ACK IP send---> ACK-> IP recv <-ACK TCP receive (ACK delayed write-> data for idle link) read ACK ACK-> write-> EOF read ACK <-ACK write-> ACK read file complete ACK-> <--TCP send <---IP send IP recv TCP recv file <-write complete (purge file being sent) ACK-> <-ACK (idle link timer wait) Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 15 G. RSCS Buffer Size Considerations The current version of VMNET defines two buffers (data blocks) for TCP receives and one for TCP sends. These values for the number of receive and send buffers are assembled in and can be changed if needed. One send buffer appears to be enough. The send buffer is transmitted, via interface calls which use VM VMCF, to the TCP/IP service machine. Thus the send buffers are buffered within the TCP/IP service machine. Should this service machine be unable to accept the data, little would be gained by building a second output buffer. The RSCS buffer size used for writing and reading the CTC is important. The smallest RSCS buffer size (400 bytes) would provide for the fullest data blocks. The reason is that a data block is considered full when the next CTC read block will not fit in the current data block. Using such a small buffer would increase the number of CTC I/O's done by both RSCS and VMNET to fill the data block. However, using a large RSCS CTC buffer size (8000 bytes) would reduce the number of CTC I/O's but might not pack the data block as fully. This failure to fully pack the data block would be because of the large number of messages and commands processed by RSCS. If a message was placed in a data block, with RSCS using a large CTC buffer size, the next CTC buffer would not fit in the space remaining and the data block (containing only the single message) would be considered full. A mid-sized RSCS CTC buffer size allows for a mix of data streams and messages. The sizes chosen for RSCS V1 using only one stream, and for RSCS V2 using several streams, may also be different. Using a VMNET buffer size of 8K bytes, an RSCS V1 buffer of 3976 is being used. For RSCS V2 using seven streams, a buffer size of 1024 bytes is being used. H. VMNET to TCP Data Flow VMNET expects that a TCP receive operation will complete or will return a fatal error. On a TCP fatal error, VMNET will attempt to close the current TCP connection and halt the RSCS connection. VMNET will then attempt to re-establish the TCP connection. VMNET on a TCP send expects to have the data accepted, a fatal error returned, or an indication that the buffer could not be accepted for transmission at this time because the TCP/IP service machine is out of buffer space. A fatal error will cause a restart of the TCP and RSCS connections. VMNET is careful in its handling of the 'wait for buffer space' condition. VMNET continues to do CTC I/O and TCP receives, while waiting for TCP send buffer space to become available. If VMNET does not continue to accept incoming TCP data and write this data to the CTC, a 'deadly embrace' condition may likely occur. The deadly embrace occurs when the connection being sent to has filled its TCP buffers with data to be sent over the same TCP connection it is receiving on. If the Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 16 sending VMNET will not accept incoming TCP data, the receiving VMNET (which is also sending) will not have space to accept the data being sent. Thus, each VMNET will wait for the other to take some action which will never occur. I. VMNET/RSCS Data Flow Control The flow of data to and from RSCS is controlled by the use of the stream mask bits, the FCS flags which are part of the RSCS block header. These flags indicate whether or not RSCS will accept data on a given stream or any stream. VMNET appears to RSCS to be another RSCS connected via the CTC. VMNET will use the FCS flags to control data flow from RSCS and must honor RSCS's use of the FCS flags when writing data to the CTC. VMNET sets the FCS for data being written over the CTC to local RSCS, ignoring the FCS which came from the remote RSCS, and monitors the FCS coming from the local RSCS. VMNET will set the FCS to stop RSCS from sending additional data when no data block is available in VMNET to hold data being read from the CTC. VMNET will use a small local buffer assembled in the link driver to hold the ACK or null record RSCS must use to acknowledge data written over the CTC. If VMNET reads other than an ACK or null buffer from RSCS, this would be considered a fatal error by VMNET and the connection restarted. By setting the FCS, VMNET is able to continue receiving TCP data and writing that data over the CTC, while preventing RSCS from writing data over the same CTC. When the data block in VMNET becomes available, the FCS is altered to allow RSCS to write data. This is the method RSCS uses to stop another RSCS from over-running it with data. The FCS can be set to flow control one or more streams or it may be set to prevent all streams including messages. VMNET sets the FCS only to allow all streams or prevent all streams. VMNET must monitor the FCS coming from RSCS. If RSCS has altered the FCS to prevent any or all streams, VMNET stops sending any data to RSCS. VMNET continues to do CTC I/O, monitoring the FCS from RSCS, but sending only ACKs or null buffers. Null buffers are written if VMNET needs to alter the FCS being sent to RSCS while RSCS is not accepting any data. When the FCS returns to a normal state, allowing all streams, VMNET will continue the writing of data received from TCP to RSCS over the CTC. VMNET treats the FCS as a switch which enables the flow of data to and/or from RSCS. No attempt is made by VMNET to deal with individual streams by use of the FCS. The FCS from the remote RSCS has no meaning, as it was used to control its connection with its VMNET. VMNET sets the FCS of each block written to RSCS. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 17 VMNET was designed to function with RSCS V2 using seven data streams and one additional for messages and commands. By using an RSCS V2 user exit, a mix of data (short files and long files) can be sent in parallel, thus making use of the bandwidth in the IP networks. This exit allows the overlap of the RSCS stream opens for the short files with the data transfer of the long file(s). The use of this RSCS V2 'transmission algorithm' which allows for the mix of short and long files is not required to use VMNET. The exit enhances the operation of RSCS V2 when used with VMNET. The exit attempts to prevent all streams from being occupied with either short or long files. VMNET will function with RSCS V1 (or RSCS V1 connected to RSCS V2), which uses only a single data stream, plus messages and commands, in each direction. Some loss of throughput can be seen in this mode because the local RSCS requires an acknowledgment from the remote RSCS for stream open and close. To help when RSCS V1 is used (or RSCS V1 is connected to RSCS V2), an option was added to VMNET known as FASTOPEN which will fake more of the RSCS open protocol and allow reduced overhead while still maintaining file integrity. J. Data Flow with the FASTOPEN Option VMNET would normally treat all of the RSCS records except for the ACKs and null records as data and place them in the outbound TCP buffer to be sent. When the FASTOPEN option is used, VMNET will respond to the RSCS 'request to open' with 'permission granted'. This will cause RSCS to begin sending data without waiting for the receiving RSCS to send the permission granted. The receiving VMNET will get an indication of the FASTOPEN as a flag in the record header and will wait for the receiving RSCS to issue 'permission granted' in response to the 'request to open'. The 'permission granted' record from the receiving RSCS will be discarded. If the response is 'permission rejected', a timer (30 seconds) is set and the 'request to open' is retried. If FASTOPEN is used with RSCS V1 about one third of the processing time for a small file can be saved. In the non-FASTOPEN case with RSCS V1, the 'request to open' is sent to the receiving RSCS and nothing can happen except for message and command traffic until the 'permission granted' is received by the sending RSCS. Although FASTOPEN will function in an RSCS V2 to RSCS V2 using multiple streams, its use is not recommended because it can cause a lockout. The lockout can occur when an RSCS V2 is SHUTDOWN or the link using FASTOPEN is DRAINed. The RSCS link being drained will not accept additional stream opens once the DRAIN process has begun. VMNET will not write data to the CTC until the stream open is accepted. Data needed to drain the remaining streams will not be written to the CTC. Thus the RSCS SHUTDOWN or DRAIN will never complete. FASTOPEN is not needed and should not be used in the RSCS V2 case because of the ability to have multi-streams which allow for the overlap of stream open Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 18 and close. FASTOPEN can be used when RSCS V1 is connected to RSCS V2 because only one data stream can be used and no lockout will occur. K. VMNET to RSCS Idle Link Considerations Data is transferred between the two RSCSs over the TCP connection in a way which allows VMNET to use the full-duplex capabilities of TCP. The VMNET link driver always has an outstanding TCP receive as long as it has a buffer to receive into. The VMNET link driver sends data as soon as the outbound buffer is ready. Once data is received from TCP it is queued for deblocking when CTC I/O is being done. The VMNET link driver will do CTC I/O as fast as RSCS will accept it. To prevent excessive CTC overhead, the VMNET link driver will note when it has both written and read only ACKS or null records from the CTC. In this case, the assumption is that the link is idle and VMNET should delay before doing the next CTC I/O. In an attempt not to delay too quickly, VMNET waits for some number of idle I/O's to be done in sequence (currently 10) before delaying for 100 ms. Each successive idle I/O after that increases the delay interval by 100 ms. to a maximum of one second. RSCS will also detect the idle state at the first idle I/O and wait for 2 seconds. All of this idle waiting changes in an RSCS V2 to RSCS V2 environment which uses a PREPARE protocol in the idle state. In the PREPARE protocol no timers are used and neither side has active I/O on the CTC. The PREPARE protocol is not yet supported by VMNET. L. RSCS/VMNET Restart Considerations VMNET examines each record read from the CTC. If VMNET detects that RSCS is sending the link startup sequence, it will cause the TCP connection to be broken. This will cause the VMNETs to attempt to re-establish the TCP connection between them. The RSCSs can then re-establish the RSCS-to-RSCS connection with the proper synchronization. This is the situation if one RSCS DRAINs the RSCS connection. Once the RSCS connection is started again from either RSCS, VMNET detects the startup condition and attempts to re-establish the TCP and RSCS paths. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 19 V. VMNET Summary VMNET is a VM service machine which establishes a TCP virtual circuit with a copy of itself running on another system. VMNET receives data from RSCS over a CTC connection and, through the use of TCP interface calls, encapsulates the RSCS data for transmission over the previously established virtual circuit. The VMNET receiving the TCP data will transform the received data into the form which is written to the CTC. Thus, VMNET allows two unmodified RSCS systems to use TCP as the transport medium. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 20 Appendix A. Format of VMNET Data Areas The following are the data area formats used by VMNET to build the data block header, TTB, and the data block record header, TTR. VMNET data blocks are built by the sending VMNET link driver and contain data records read from the CTC connected to RSCS. Data blocks are passed to the TCP/IP service machine via interface calls. The receiving VMNET link driver receives the data blocks from the TCP/IP service machine and writes the data records to the CTC for RSCS to process. The general format of the data block is: TTB TTRdata TTRdata TTRdata ... TTREOB Data block header (TTB) The TTB is a fixed length header which begins each data block created by VMNET. 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |F|U| LN| UNUSE | +-+-+-+-+-+-+-+-+ F - Flags, no current values defined. U - Unused space, reserved for future expansion. LN - Length of data block, binary 16 bit value. This value is the total length of the data block, including the length of the TTB and end-of-buffer TTR. UNUSE - Unused space, reserved for future use. Data block record header (TTR) The TTR is a fixed length header built by VMNET, which precedes each record read from RSCS over the CTC. 0 1 2 3 +-+-+-+-+ |F|U| LN| +-+-+-+-+ Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 21 F - Flags used to pass information about this record. x'80' - FASTOPEN flag indicates this CTC record is a 'request to open' which was processed by the sending VMNET as a FASTOPEN. U - Unused space, reserved for future expansion. LN - Length of data record, binary 16 bit value. This value is the length of the record read from the CTC. The length does NOT include the length of the TTR header. If the length in a TTR is zero, this is the end-of-block marker. VMNET control record format A VMNET control record is sent by VMNET's active open routine after the active open is completed. VMNET's port handler responds with a VMNET control record which indicates the status of the VMNET link. The exchange of the VMNET control records must take place as the first exchange of data on the TCP connection after the TCP open is complete. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | RHost | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RIP | OHost | OIP | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| +-+ Type - Type of request in EBCDIC characters, left justified, and padded with blanks. Acceptable values are OPEN, ACK, and NAK. RHost - Name of host sending the control record, the same value as RSCS LOCAL associated with this link. This field is ECBDIC characters, left justified, and padded with blanks. RIP - Hex value of IP address sending control record. As an example, IP address 128.112.14.1 would have a value of x'80700E01'. OHost - Name of host expected to receive the control record. Same format as RHost. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 22 OIP - Hex value of IP address expected to receive the control record. Same format as RIP. R - Reason code in binary, used to return additional information if type is NAK. Valid values are: x'01' - no such link could be found. x'02' - link found in active state and will be reset. x'03' - link found attempting an active open. Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 23 Appendix B. Example of a VMNET Data Block The following is an actual data block built by VMNET. Three streams are open and transmitting files. bytes 0 1 2 3 4 5 6 7 8 9 A B C D E F ======== ======== ======== ======== 00001D94 00000000--->VMNET data block header (TTB) 000003AD--->VMNET record header (TTR) 1002--->Start of RSCS header 89--->RSCS header (BCB) 8F--->RSCS header (FCS) CF--->RSCS header (FCS) A9--->RSCS header (RCB) 80--->RSCS header (SRCB) FF--->RSCS header (SCB) 50F5F5F5 F5F5F5F5 F5F5F5F5--->data F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5 D1 F5F5F5F5 F5F5F5F5 F5F5F5F5--->SCB,data F5F5F5F5 F5--->RSCS data 00--->RSCS end-of-record A980 FF50F6F6 F6F6F6F6--->RCB,SRCB,SCB,data F6F6F6F6 F6F6F6F6 F6F6F6F6 F6F6F6F6--->data F6F6F6F6 F6F6F6F6 F6F6F6F6 F6F6F6F6 . F6F6F6F6 F6F6F6F6 F6F6F6F6 F6F6F6F6 . F6F6F6F6 F6F6F6F6 . D1F6F6F6 F6F6F6F6--->SCB,data F6F6F6F6 F6F6F6F6 F6F600--->data,EOR A9 80FF50F7--->RCB,SRCB,SCB,data F7F7F7F7 F7F7F7F7 F7F7F7F7 F7F7F7F7---> data F7F7F7F7 F7F7F7F7 F7F7F7F7 F7F7F7F7 . F7F7F7F7 F7F7F7F7 F7F7F7F7 F7F7F7F7 . F7F7F7F7 F7F7F7F7 F7F7F7F7 F7D1F7F7--->data,SCB,data F7F7F7F7 F7F7F7F7 F7F7F7F7 F7F7F700--->data,EOR A980FF50 F8F8F8F8 F8F8F8F8 F8F8F8F8--->RCB,SRCB,SCB,data F8F8F8F8 F8F8F8F8 F8F8F8F8 F8F8F8F8--->data F8F8F8F8 F8F8F8F8 F8F8F8F8 F8F8F8F8 . F8F8F8F8 F8F8F8F8 F8F8F8F8 F8F8F8F8 . F8F8D1F8 F8F8F8F8 F8F8F8F8 F8F8F8F8--->data,SCB,data F8F8F8F8 00A980FF 50F9F9F9 F9F9F9F9--->the F9F9F9F9 F9F9F9F9 F9F9F9F9 F9F9F9F9 pattern F9F9F9F9 F9F9F9F9 F9F9F9F9 F9F9F9F9 continues F9F9F9F9 F9F9F9F9 F9F9F9F9 F9F9F9F9 . F9F9F9F9 F9F9F9D1 F9F9F9F9 F9F9F9F9 . Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 24 F9F9F9F9 F9F9F9F9 F900A980 FF50F0F0 . F0F0F0F0 F0F0F0F0 F0F0F0F0 F0F0F0F0 . F0F0F0F0 F0F0F0F0 F0F0F0F0 F0F0F0F0 . F0F0F0F0 F0F0F0F0 F0F0F0F0 F0F0F0F0 . F0F0F0F0 F0F0F0F0 F0F0F0F0 D1F0F0F0 . F0F0F0F0 F0F0F0F0 F0F0F0F0 F0F000A9 . 80FF50F1 F1F1F1F1 F1F1F1F1 F1F1F1F1 . F1F1F1F1 F1F1F1F1 F1F1F1F1 F1F1F1F1 . F1F1F1F1 F1F1F1F1 F1F1F1F1 F1F1F1F1 . F1F1F1F1 F1F1F1F1 F1F1F1F1 F1F1F1F1 . F1D1F1F1 F1F1F1F1 F1F1F1F1 F1F1F1F1 . F1F1F100 A980FF50 F2F2F2F2 F2F2F2F2 . F2F2F2F2 F2F2F2F2 F2F2F2F2 F2F2F2F2 . F2F2F2F2 F2F2F2F2 F2F2F2F2 F2F2F2F2 . F2F2F2F2 F2F2F2F2 F2F2F2F2 F2F2F2F2 . F2F2F2F2 F2F2D1F2 F2F2F2F2 F2F2F2F2 . F2F2F2F2 F2F2F2F2 00A980FF 50F3F3F3 . F3F3F3F3 F3F3F3F3 F3F3F3F3 F3F3F3F3 . F3F3F3F3 F3F3F3F3 F3F3F3F3 F3F3F3F3 . F3F3F3F3 F3F3F3F3 F3F3F3F3 F3F3F3F3 . F3F3F3F3 F3F3F3F3 F3F3F3D1 F3F3F3F3 . F3F3F3F3 F3F3F3F3 F3F3F3F3 F300A980 . FF50F4F4 F4F4F4F4 F4F4F4F4 F4F4F4F4 . F4F4F4F4 F4F4F4F4 F4F4F4F4 F4F4F4F4 . F4F4F4F4 F4F4F4F4 F4F4F4F4 F4F4F4F4 . F4F4F4F4 F4F4F4F4 F4F4F4F4 F4F4F4F4 . D1F4F4F4 F4F4F4F4 F4F4F4F4 F4F4F4F4 . F4F400A9 80FF50F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5D1F5F5 F5F5F5F5 F5F5F5F5 . F5F5F5F5 F5F5F5 00 00--->RSCS end-of-buffer 000003 AD10028A--->VMNET (TTR),RSCS 8FCFB980 FF50F5F5 F5F5F5F5 F5F5F5F5 header,data -------- RSCS data deleted -------- F5F5F5F5 F5F5F5F5 F5F5F5F5 F5F5F5F5--->data 0000--->RSCS eob 0000 03AD1002 8B8FCF99 80FF50F7--->TTR,RSCS header F7F7F7F7 F7F7F7F7 F7F7F7F7 F7F7F7F7--->data -------- RSCS data deleted -------- F70000--->data,RSCS eob 00 0003AD10 028C8FCF A980FF50--->TTR,RSCS header F6F6F6F6 F6F6F6F6 F6F6F6F6 F6F6F6F6--->data -------- RSCS data deleted -------- F6F6F6F6 F6F6F6F6 F6F60000--->data,EOR 000003AD--->TTR Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 25 10028D8F CFB980FF 50F6F6F6 F6F6F6F6--->RSCS header,data -------- RSCS data deleted -------- F6F6F6F6 F6F6F6F6 F6F6F600 0000--->data,EOR 0003--->TTR AD10028E 8FCF9980 FF50F8F8 F8F8F8F8--->TTR,RSCS header, -------- RSCS data deleted -------- data F8F8D1F8 F8F8F8F8 F8F8F8F8 F8F8F8F8--->data F8F8F8F8 0000--->data,EOR 0000 03AD1002 8F8FCFA9--->TTR,RSCS header 80FF50F7 F7F7F7F7 F7F7F7F7 F7F7F7F7--->RSCS header,data -------- RSCS data deleted -------- F7F7F7F7 F70000--->data,EOR 00 0003AD10 02808FCF--->TTR,RSCS header B980FF50 F7F7F7F7 F7F7F7F7 F7F7F7F7--->RSCS header,data -------- RSCS data deleted -------- F7F7F7F7 F7F70000---data,EOF 00000000--->TTR (VMNET end-of-block) Olenick VMNET Technical Overview April 1989 BRFC 0002 BITNETII Page 26 Address of Author: Peter A. Olenick Office of Computing and Information Technology Princeton University 87 Prospect Avenue Princeton, NJ 08544 USA BITNET: Q0239@PUCC Internet: q0239@pucc.princeton.edu Telephone: (609) 452-6024 Olenick VMNET Technical Overview April 1989