25-Nov-97 23:13:44-GMT,17151;000000000005 Return-Path: Received: from CUVMB.CC.COLUMBIA.EDU (cuvmb.cc.columbia.edu [128.59.40.129]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with SMTP id SAA26165 for ; Tue, 25 Nov 1997 18:13:44 -0500 (EST) Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP V2R1) with BSMTP id 8461; Tue, 25 Nov 97 18:14:36 EST Date: Tue, 25 Nov 1997 18:13 EST From: "John F. Chandler" To: Frank da Cruz Subject: Re: New stuff draft 2 In-reply-to: fdc@watsun.cc.columbia.edu message of Sun, 2 Nov 97 20:11:17 EST Message-id: Frank, Here are some reactions, prefixed with ">>>". John --------- SOME MINOR ADDITIONS TO THE KERMIT PROTOCOL D R A F T # 2 Sun Nov 2 13:49:46 1997 1. DIRECTORY OPERATIONS The aim of these changes is to allow the exchange of directory trees or file systems. It is assumed that all file systems are either tree-structured or flat. Hardly any protocol changes are needed, mainly just agreements on data formats. Most of the features are implemented outside the protocol: recursive SEND commands, automatic directory creation during RECEIVE commands, etc. 1.0. Directory Name Format Selection (This is simplified considerably in Draft 2 after I implemented it in C-K...) SET FILE NAMES { CONVERTED, LITERAL } Now applies to pathnames too. For pathnames, CONVERTED means that the native directory notation is converted to standard format when sending, and the standard format is assumed when receiving. The related command: SET { SEND, RECEIVE } PATHNAMES { OFF, ABSOLUTE, RELATIVE } then applies as usual. PATHNAMES are OFF by default, in which case nothing is different. When SEND PATHNAMES is ABSOLUTE or RELATIVE, then the FILE NAMES setting is applied to them just as it is to the rest of the filename. When receiving files, a Kermit program should be expected to understand its own native format and the native one; it cannot be expected to understand a foreign directory notation. Thus SET FILE NAMES CONVERTED should be used between unlike systems. Note: There is no reason why there can't be separate SET FILE NAMES commands and settings for each direction. Note 2: We haven't said anything that affects the protocol yet, that comes in the next section. 1.1. Kermit Protocol Directory Name Representation UNIX notation shall be used for directories when FILE NAMES are CONVERTED. Forward slash (/) is the directory separator. If a / appears as a literal character in a directory name, then it should be written as //. A file or directory specification beginning with / is absolute, otherise it is relative. This is more or less the same scheme used by Info-ZIP and so it is widely proven in the real world. Note: I have this working now in VMS as well as UNIX, but so far just in the sender -- will do the receiver tomorrow I hope. Example from today's actual logs: FILENAMES SEND PATHNAMES UNIX Result VMS Result CONVERTED OFF OOFA.TXT OOFA.TXT CONVERTED RELATIVE BLAH/OOFA.TXT BLAH/OOFA.TXT CONVERTED ABSOLUTE /W/FDC/TMP/BLAH/OOFA.TXT /FDC/BLAH/OOFA.TXT LITERAL OFF oofa.txt OOFA.TXT LITERAL RELATIVE blah/oofa.txt [.BLAH]OOFA.TXT LITERAL ABSOLUTE /w/fdc/tmp/blah/oofa.txt [FDC.BLAH]OOFA.TXT 1.2. Client/Server Directory Operations REMOTE MKDIR G packet function code "m" (yes, lowercase). Creates the specified directory. Names are as in 1.1 (absolute or relative). REMOTE RMDIR G packet function code "r". Removes specified directory. Name can be wild. REMOTE RMDIR /RECURSIVE G packet function code "s". Removes specified directory tree and all its contents. Like rm -Rf in UNIX. Name can be wild. 1.3. GET /RECURSIVE New packet types: V for GET /RECURSIVE. Tells server to send all files that match the given specification in the current or given directory tree. Otherwise just like G for GET. W for GET /DELETE /RECURSIVE. Like V, but the server should delete each file after it is sent successfully. That should do it. 2. 32-BIT CRC We might as well, why not. The code for the CHKT field in the init string is "4". 32-bit CRC must not be implemented in the absence of 16-bit CRC. A special rule applies here, namely if one Kermit says "4" and the other says "3", then fall back to "3" instead of "1". The generating polynomial is: >>> Can't do that because it violates existing protocol. If the other >>> Kermit says "3" and doesn't know about "4", it falls back to "1", so >>> we must do the same. X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 taken "backwards" with the highest-order term in the lowest-order bit. The X^32 term is "implied"; the LSB is the X^31 term, etc. The X^0 term (usually shown as "+1") results in the MSB being 1. Code will be based on the well known and open Gary Brown code that everybody else uses. Unlike the type 1, 2, and 3 block checks, the 32-bit one should be encoded to never contain a blank. We can either use the same encoding as for the 16-bit CRC but excess-33 instead of -32 (resulting in 6 bytes), or we can write it more compactly as a base-94 number whose lowest digit is "!". (How many bytes is that?) >>> five -- could use "5" as the code instead of "4"?? (Joe notes that there might not be much value here, but we have learned that trying to persuade the masses that the reason we don't have such-and-such a feature that the others (read "Zmodem") have by filling blackboards full of math never works -- better to just go along... Anyway, this is just for the protocol definition, not necessarily to be implemented anywhere, and certainly not *required* anywhere.) 3. EX-POST-FACTO PER-FILE CRC CHECKING MS-DOS Kermit and C-Kermit can accumulate a 16-bit CRC of an entire transaction, and they include a rather cumbersome process for comparing the CRCs afterward, which works only in a client/server setting, and is script based: if fail remote query kermit crc16 if not = \v(query) \v(crc16) Obviously this can be expected to succeed only for binary-mode transfers, and so scripts that use this technique will break in text mode. A more general mechanism can be added to the protocol itself as follows: a. Add a new S/I packet parameter, after the last one that is defined, whatever that is (don't worry, I'll look it up). A single byte, this character has the same values as the Block Check parameter, except only "3" or "4" should be allowed. b. Add SET commands to turn the feature ON and OFF. It should be OFF by default, to avoid the extra overhead. c. When ON, it should be operative only for binary-mode transfers. >>> Why can't this be applied to the "canonical" text form? I.e., after >>> line delimiters have been converted to CRLF (if necessary) and the >>> text has been translated to the transfer character set, but before >>> control-quoting and repeat-count compression, etc. d. At the end of file, the file sender puts the following in the Z-packet data field: The letter C and then the decimal character representation of the negotiated type of CRC for the file. >>> Didn't you say the MSB will always be on? If so, then the decimal >>> representation will depend on whether you understand the value to be >>> signed 32-bit or unsigned 32-bit. Why not just encode the value in >>> the same way as for packet checksums? e. If the CRC from (d) does not agree with the receiver's CRC, the receiver ACKs the Z packet with a Data field of N, optionally followed by its own CRC, otherwise it ACKs with either an empty data field or the letter C followed by the CRC (exactly as in the Z packet). It is up to the receiver how to dispose of the file when the CRCs don't match. f. When the sender receives a CRC mismatch indication, the SEND command must fail. But what does this mean when a file group is being sent? Should it stop and send an error packet or go on to the next file? This must be a user choice, so there will need to be some SET commands... In any case, if it is a SEND /DELETE (aka MOVE) operation, then the source file must not be deleted. Appropriate notations must be made in the transaction log, if any, etc. The per-file CRC mechanism operates independently of the \v(crc16) variable, which accumulates a CRC over the entire transfer, which could obviously become bollixed if a mixture of text and binary files were transferred in the same transaction, as can occur with VMS C-Kermit. 4. The Capabilities Mask We're out of bits, except for the "continued" bit. But if we use the continuation mechanism, we'll no doubt break every non-Kermit-Project Kermit implementation on earth, and probably also many of the old ones in our own collection. So to add more capability bits, we'll need to leave the "continued" bit blank, and add the second capabilities mask at the end. 5. Info Exchange The idea is for the two Kermits to exchange information with each other that applies to the transaction as a whole, but is beyond the scope of (too voluminous for) the S/Y or I/Y exchange. a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). At this point, if the two Kermits have agreed, the sender may (but need not) send an "L" packet, which contains an unencoded parameter-length-value (PLV) sequence (just like an "A" packet) of information applying to the connection and the entire transfer. Parameters (all are optional): F = (Sender only) Number of files (expressed as decimal string) L = (Sender only) Total length, decimal string. Obviously iffy for text-mode transfers, but we've always had that problem. E = Encoding: Kermit transfer character-set designation for text used in any of these fields that can contain arbitrary text. Default = ASCII. Syntax: exactly as in A packet. H = Hostname (e.g. so local Kermit can show remote host's name on the file transfer display). D = Current directory, syntax according to SET FILE NAMES. O = Organization name. Arbitrary text, encoding specified in E. C = Country code (ISO 3166). T = Connection type (to allow automatic choices of various things based on whether the connection is known to be reliable -- e.g. TCP/IP at *both* ends). Number. 0 = unknown (usually the case when in remote mode); 1 = serial port; 2 = ISDN; 3 = TCP; 4 = UDP; 5 = CTERM; 6 = LAT; etc etc. A = Address. Interpreted according to connection type. This can be the IP hostname, IP address, or other address specific to the network type, or telephone phone number in +1(212)7654321 format, for display on the other Kermit's screen, or logging, or callback, or any other desired reason. All sorts of uses for this one can be imagined. Z = Timezone. There is some standard for this. Can be used to adjust A-packet date/times, which are always in local time. Applies only to terrestrial transfers. >>> Can't use the current time zone to adjust the date/time of files >>> previously created -- the daylight/standard switching mechanism is >>> not universal. Also, the time stamp on a file may or may not be >>> set in local time. Some installations choose to run on UT! Further, >>> the time stamp on a file doesn't reflect the time zone that was >>> in effect when the file was created. If the system manager decides >>> it was stupid to run on UT, and switches over to local standard time, >>> there's no "paper trail". X = Encryption identifier (this needs spelling out). K = Public key for X, when applicable. N = (Receiver only): No. Refuses the transaction. Optionally one or more more parameter letters are given as data, to indicate the reason for refusal. etc etc... The order doesn't matter, except that if E is given, it must precede any arbitrary-text fields. We can have up to 96 parameters, one for each 7-bit graphic character. One must be reserved as an escape for when we run out. NOTE: "L" was our last unused uppercase letter for packet types. Additional packet types will be lowercase letters or other graphic characters. At least one must be reserved as an escape for when we run out. 6. Extended Sequence Numbers and Window Size 32 just isn't big enough, e.g. for interplanetary transfers, not to mention the Internet some days. But we can't increase it beyond 32 because it is limited to the half the sequence-number range. Thus for larger windows we must increse the sequence number space. But we can't do this in the regular sequence number field, at least not significantly, because it is restricted to a 64-byte codeset (in theory maybe 94, but that too would require a change in the protocol, and as long as we're changing it, let's shoot higher). 6.1. Negotiation a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). d. Add another 2-byte field to the init string, XWINDO. This works exactly like long packet negotiation. If the bit is set then we fetch the actual window size from the two XWINDO bytes, which are in excess-32 base 95 notation, just like the extended packet length. The receiver that doesn't understand this option, of course, fetches the window size from the regular WINDO field. The maximum extended window size is: 95^2 - 1 = 9024 / 2 = 4512 6.2. Packet Format When an extended window size is negotiated, the packet sequence number is indicated as ` (backquote, ASCII 96) to indicate that the full 2-byte base-95 packet number is included in the extended header. For long packets, this goes between the length and the header checksum. For short packets, it forms the extended header by itself (plus a checksum). >>> This applies only to D packets, right? The maximum extended sequence number is thus 95^2 - 1 = 9024, and the maximum window size is half that, or 4512. A 4512-packet window of 9024-byte packets (the theoretical maximum) would require about 7MB of packet buffers. Obviously a smaller actual maximum can be imposed by the implementation. 6.3. Improved Packet Framing This is changed from yesterday -- now it's imply folded in with the new packet format. There is nothing in a basic Kermit packet to indicate where the data ends and the block check begins. But we have the opportunity in extended-sequence packets to use a better format. In these packets, the packet length indicates the beginning of a PLV format block check. Parameters are the block-check codes (1, 2, 3, B, 4). The length indicates the number of bytes in the block check. Then the block check. In addition to preventing foulups, this allows the block check type to be varied dynamically throughout the transaction. It >>> Why would we want to do that??? also allows a graphic character to be placed after the block check in case it ends with a blank. >>> In practice, we can do that already. 7. Supervisory Packets These can be used for "out of band" functions. Supervisory packets must be numbered, just like regular ones, because otherwise there is no way for the receiver to indicate that it was or wasn't received. >>> Is this going to play havoc with sliding windows? E.g., changing >>> the packet size upon request of the receiver would logically >>> demand that the packets already in transit be "flushed" somehow. >>> Ouch! Let's call this a "u" packet. It can be sent only by the file sender, and it can be sent at any time during a transaction if negotiated: a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). Contents are, again, the familiar PLV sequences. Some possible parameters: M = Message. To be logged or shown in the display. W = Change window size P = Change packet length R = Reset to defaults S = Sync D = Drain B = Buffer credit (I'm not really sure yet whether any of these make sense, or what they would do, or how they would work, or what else we can do here, so this is mainly just a placeholder.) The sender ACKs with the normal indications (Y or N, length, list of tags). If the file receiver wants to send a supervisory message, it can be placed into the data field of any D-packet ACK: the letter "u" followed by PLV sequences (we can't put these in *any* ACK because some already are allowed to contain arbitrary string data, e.g. ACK(F), tsk tsk). The file sender "acknowledges" by sending a "u" packet, which must then be ACK'd by the receiver with an empty ACK.