1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 TRANSITIONAL//EN" "http://www.w3.org/TR/html4/loose.dtd">
5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6 <title>swift: the multiparty transport protocol</title>
7 <link rel="stylesheet" href="style.css" type="text/css">
8 <script src="http://code.jquery.com/jquery-latest.js"></script>
10 $(document).ready(function(){
11 $("div.fold>p, div.fold>ul, div.fold>div").hide("fast");
12 $("div.fold>h2, div.fold>h3, div.fold>h4").attr("bullet","closed");
13 $("div.fold>h4, div.fold>h3, div.fold>h2").click(function () {
14 $("div.fold>p, div.fold>ul, div.fold>div").hide("fast");
15 $("div.fold>h2, div.fold>h3, div.fold>h4").attr("bullet","closed");
16 $(this).attr("bullet","open");
17 $(this).siblings().each(
19 $(this).show("normal");
29 <img id=logo src="apusapus.png"/>
30 <h1><i>swift:</i> the multiparty transport protocol</h1>
32 <div id='motto'>Turn the Net into a single data Cloud!</div>
36 Current Internet protocols are geared for 1:1 client/server
37 communication. We expanded the TCP/IP protocol suit with swarming.
38 Our protocol is designed to be capable of integration
39 into browsers / operating systems and able to serve 95% of current Internet
42 <b>swift</b> is a multiparty transport protocol; its mission is to
43 disseminate content among a swarm of peers. It is a sort of
44 <a href="http://bittorrent.org">BitTorrent</a>
45 at the transport layer. <!--We generalised the ideas of
46 <a href="http://bittorrent.org">BitTorrent</a> to make it
47 suitable for new usage areas such as distributed filesystems and
48 Web page delivery.--> The TCP+Bittorrent stack consists of
49 60<a href='http://en.wikipedia.org/wiki/Source_lines_of_code'>KSLoC</a>
50 + 90KSLoC respectively
51 (by <a href='http://www.dwheeler.com/sloccount/'>SLOCCount</a>).
52 With novel datastructures and refactoring we managed to implement
53 <a href="http://github.com/gritzko/swift/raw/master/doc/swift-protocol.txt">swift</a>
54 in a mere 4KSLoC of cross-platform C++ code.
55 <a href="http://github.com/gritzko/swift">libswift</a> library
56 is licensed under LGPL; it runs on misc Unices, Mac OS X and Windows;
57 it uses UDP with <a href="http://tools.ietf.org/wg/ledbat/">LEDBAT</a> congestion control.
58 Currently maximum throughput is 400Mbps, but we are
59 <a href='http://mughal.tribler.org:8080'>working on that</a>;
60 <!--going to 600 Mbps with further optimizations, and final--> our next target is 1 Gbps.
61 The library is delivered as a part of <a href="http://p2p-next.org">P2P-Next</a>,
62 funded by <a href="http://cordis.europa.eu/fp7/dc/index.cfm">EU FP7</a>.
66 <p>As <a href="http://en.wikipedia.org/wiki/Content-centric_networking">
67 wise people say</a>, the Internet was initially built for
68 remotely connecting scientists to expensive supercomputers (whose computing
69 power was comparable to modern cell phones). Thus, they supported the abstraction
70 of <i>conversation</i>. Currently, the Internet is mostly used for <i>disseminating</i>
71 content to the masses, which mismatch definitely creates some problems.</p>
73 <p>The <i>swift</i> protocol is a content-centric multiparty transport
74 protocol. Basically, it answers one and only one question: <i>"Here is a
75 <a href="http://en.wikipedia.org/wiki/Hash_function">hash</a>!
76 Give me data for it!"</i> A bit oversimplifying, it might be understood as
77 <a href="http://en.wikipedia.org/wiki/BitTorrent">BitTorrent</a>
78 at the <a href="http://en.wikipedia.org/wiki/Transport_Layer">
79 transport layer</a>. Ultimately, it aims at the abstraction of the Internet
80 as a single big data <a href="http://en.wikipedia.org/wiki/Cloud_computing">
81 Cloud</a>. Such entities as storage, servers and connections are abstracted
82 away and are virtually invisible at the API layer. Given a hash,
84 from whatever source available and data integrity is checked
85 cryptographically with <a href="http://en.wikipedia.org/wiki/Hash_tree">
86 Merkle hash trees</a>.</p>
88 <p>An old Unix adage says: "free memory is wasted memory". Once a
89 computer is powered on, there is no benefit in keeping some memory
90 unoccupied. We may extend this principle a bit further:
92 <li>free bandwidth is wasted bandwidth;
93 <li>free storage is wasted storage.
95 Unless your power budget is really tight, there is no sense in conserving
96 either. Thus, instead of putting emphasis on reciprocity and incentives
97 we focus on ligther-footprint code,
98 <a href="http://tools.ietf.org/wg/ledbat/">non-intrusive</a>
99 <a href="http://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm">
100 congestion control</a> and automatic disk space management.
103 <p>Currently, most parts of the protocol/library are implemented, pass
104 basic testing and successfully transfer data on real networks.
105 After more scrutinized testing, the protocol and the library are expected
106 to be real-life-ready in March'10.
110 <div id='design'> <h2>Design of the protocol</h2>
112 <p>Most features of the protocol are defined by its function of
113 content-centric multiparty transport protocol. It entirely drops the TCP's
114 abstraction of sequential reliable data stream delivery as it is redundant in
115 our case. For example, out-of-order data could still be saved and the same
116 piece of data might be always received from another peer.
117 Being implemented over UDP, the protocol does its best to make
118 every datagram self-contained.
119 In general, cutting of unneeded functions and aggressive layer collapsing
120 greatly simplified the protocol, compared to e.g. the BitTorrent+TCP stack.</p>
122 <div class='fold'> <h3>Atomic datagrams, not data stream</h3>
123 <p>To achieve per-datagram flexibility of data flow and also to adapt to
124 the unreliable medium (UDP, and, ultimately, IP), the protocol was built
125 around the abstraction of atomic datagrams.
126 Ideally, once received, a datagram is either
127 immediately discarded or permanently accepted, ready to be forwarded to
129 For the sake of flexibility, most of the protocol's messages
130 are optional. It also has no "standard" header. Instead, each datagram
131 is a concatenation of zero or more messages. No message ever spans two
132 datagrams. Except for the data pieces themselves, no message is
133 acknowledged, as thus, not guaranteed to be delivered.</p>
136 <div class='fold'> <h3>Scale-independent unit system</h3>
137 <p>To avoid a multilayered request/acknowledgement system, when every
138 next layer does basically the same, but for bigger chunks of data, as
139 it is the case with BitTorrent+TCP packet-block-piece-file-torrent
140 stacking, <i>swift</i> employs a scale-independent acknowledgement/
141 request system, where data is measured by aligned power-of-2 intervals
142 (so called bins). All acknowledgements and requests are done in terms
146 <div class='fold'> <h3>Datagram-level integrity checks</h3>
147 <p><i>swift</i> builds Merkle hash trees down to every single packet
148 (1 kilobyte of data). Once data is transmitted, all uncle hashes
149 necessary for verification are prepended to the same datagram.
150 As the receiver constantly remembers old hashes, the average number
151 of "new" hashes, which have to be transmitted is small,
152 normally around 1 per packet of data.</p>
155 <div class='fold'> <h3>NAT traversal by design</h3>
156 <p>The only method of peer discovery in <i>swift</i> is
157 <a href="http://en.wikipedia.org/wiki/Peer_exchange">PEX</a>:
158 a third peer initiates a connection between two of its contacted peers.
159 The protocol's handshake is engineered to perform simple NAT hole
160 punching transparently, in case that is needed.
164 <div class='fold'> <h3>Subsetting of the protocol</h3>
165 <p>Different kinds of peers might implement different subsets of messages;
166 a "tracker", for example, uses the same protocol as every peer, except
167 it only accepts the HANDSHAKE message and the HASH message (to let peers
168 explain what content they are interested in), while returning only
169 HANDSHAKE and PEX_ADD messages (to return the list of peers).
170 Different subsets of accepted/emitted messages may correspond to push/pull
171 peers, plain trackers, hash storing trackers, live streaming peers etc.
175 <div class='fold'> <h3>Push AND pull</h3>
176 <p>The protocol allows both for PUSH (sender decides what to send) and
177 PULL (receiver explicitly requests the data). PUSH is normally used as
178 a fallback if PULL fails; also, the sender may ignore requests and send
179 any data it finds convenient to send. Merkle hash trees allow this
180 flexibility without causing security implications.</p>
183 <div class='fold'> <h3>No transmission metadata</h3>
188 <div> <h2>Specifications and documentation</h2>
191 <li><a href="http://github.com/gritzko/swift/raw/master/doc/swift-protocol.txt">
192 internet draft style technical description</a>
193 <li><a href="">article</a>
198 <div> <h2>The code</h2>
200 <li><a href="http://97.107.136.211/files/p2tp.tar.gz">fairly recent tarball</a>
201 <li><a href="http://github.com/gritzko/swift"><i>swift</i> at github</a>
205 <div> <h2>Frequently asked questions</h2>
207 <div class="fold"> <h3>Well, why "swift"?</h3>
208 <p>That name served well to many other protocols; we hope it will serve well to ours.
209 You may count it as a meta-joke. The working name for the protocol was "VicTorrent".
210 We also insist on lowercase italic <i>"swift"</i> to keep the name formally unique
211 (for some definition of unique).</p>
214 <div> <h3>How is it different from... </h3>
216 <div class='fold'><h4>...TCP?</h4>
217 <p>TCP emulates reliable in-order delivery ("data pipe") over
218 chaotic unreliable multi-hop networks. TCP has no idea what data it is dealing
219 with, as the data is passed from the userspace.
220 In our case, the data is fixed in advance and many peers participate in
221 distributing the same data. Orderness of delivery is of little importance and
222 unreliability is naturally compensated by redundance.
223 Thus, many functions of TCP turn out to be redundant.
224 The only function of TCP that is also critical for <i>swift</i> is the congestion
225 control, but... we need our own custom congestion control! Thus, we did not use
227 <p>That led both to hurdles and to some savings. As one example, every TCP
228 connection needs to maintain buffers for the data that has left the sender's userspace, but
229 did not yet arrive at the receiver's userspace. As we know that we are dealing
230 with the same fixed data, we don't need to maintain per-connection buffers.
234 <div class='fold'><h4>...UDP?</h4>
235 <p>UDP, which is the thinniest wrapper around <a href="http://en.wikipedia.org/wiki/IPv4">
236 IP</a>, is our choiсe of underlying protocol. From the standpoint of ideology, a
237 transport protocol should be implemented over IP, but unfortunately that causes
238 some chicken-and-egg problems, like a need to get into the kernel to get deployments,
239 and a need to get deployments to be accepted into the kernel.
240 UDP is also quite nice in regard to NAT penetration.</p>
243 <div class='fold'><h4>...BitTorrent?</h4>
244 <p>BitTorrent is an application-level protocol and quite a heavy one. We focused
245 on fitting our protocol into the restrictions of the transport layer, assuming
246 that the protocol might eventually be included into operating system kernels.
247 For example, we stripped the protocol of any transmission's metadata (the
248 .torrent file); leaving a file's root hash as the only parameter.</p>
251 <div class='fold'><h4>...µTorrent's µTP?</h4>
252 <p>Historically, BitTorrent gained lots of adaptations to its underlying
253 transport. First and foremost, TCP is unable to prioritize traffic, so BitTorrent
254 needed to coerce users somehow to tolerate inconviniences of seeding. That caused
255 tit4tat and, to significant degree, rarest-first. Another example is the 4
256 upload slots limitation. (Well, apparently, some architectural decisions in
257 BitTorrent were dictated by oddities Windows 95, but... never mind.)
258 Eventually, BitTorrent developers came to the conclusion that not annoying
259 the user in the first place is probably a better stimulus. So they came up with
260 the <a href='http://www.ietf.org/dyn/wg/charter/ledbat-charter.html'>LEDBAT</a>
261 congestion control algorithm (Low Extra Delay Background Transport).
262 LEDBAT allows a peer to seed without interfering with the regular traffic
263 (in plain words, without slowing down the browser).
264 To integrate the novel congestion control algorithm into BitTorrent incrementally,
265 BitTorrent Inc had to develop TCP-alike transport named
266 <a href="http://en.wikipedia.org/wiki/Micro_Transport_Protocol">µTP</a>.
267 The <i>swift</i> project (then named VicTorrent) was started when we tried to understand
268 what happens if we'll strip BitTorrent of any Win95-specific, TCP-specific
269 or Python-specific workarounds. As it turns out, not much was left.
273 <div class='fold'><h4>...Van Jacobson's CCN?</h4>
274 <p><a href="http://www.parc.com/about/people/88/van-jacobson.html">Van Jacobson's</a>
275 team in PARC is doing exploratory research on
276 <a href="http://en.wikipedia.org/wiki/Content-centric_networking">content-centric
277 networking</a>. While BitTorrent works at layer 5 (application), we go to layer 4 (transport),
278 PARC people are bold enough to go to layer 3 and to propose a complete replacement
279 for the entire TCP/IP world. That is certainly a compelling vision, but we focus
280 on near future (<10 years); while <a href="http://www.ccnx.org/">CCNx</a> is a
281 much more ambitious rework.</p>
284 <div class='fold'><h4>...DCCP?</h4>
285 <p>This question arises quite frequently as DCCP is a congestion-controlled datagram
286 transport. The option of implementing <i>swift</i> over
287 <a href="http://en.wikipedia.org/wiki/DCCP">DCCP</a> was considered,
288 but the inconvinience of working with an esoteric transport was not compensated by the
289 added value of DCCP, which is limited to one mode of congestion control being readily
291 Architectural restrictions imposed by DCCP were also found to be a major inconvenience.
292 Last but not least, currently only Linux supports DCCP at the kernel level.</p>
295 <div class='fold'><h4>...SCTP?</h4>
296 <p><a href="http://en.wikipedia.org/wiki/SCTP">SCTP</a> is a protocol fixing some
297 shortcomings of TCP mostly in the context of telephony. As it was the case with
298 DCCP, contributions of SCTP were of little interest to us, while things we really
299 needed were missing from SCTP. Still, we must admit that we employ quite a similar
300 message-oriented model (as opposed to the TCP's stream-oriented). </p>
305 <div class='fold'> <h2>Who we are</h2>
308 <li> <a href="http://p2p-next.org/">P2P-Next Project</a>, EU 7th Framework Program
309 <li> <a href="http://ewi.tudelft.nl">Delft University of Technology</a> EWI PDS
310 <li> <a href="http://www.st.ewi.tudelft.nl/~pouwelse/">Dr. Johan Pouwelse</a>
311 <li> <a href="http://www.tribler.org/trac/wiki/VictorGrishchenko">
312 Dr. Victor Grishchenko</a>
317 <div class=fold> <h2>Contacts&feedback</h2>
318 <p><a href="mailto:victor.grishchenko@gmail.com">mail us</a></p>
319 <p>subscribe to a mailing list</p>