|
This chapter has been added to allow us to share our perspective on certain technical choices. Certain issues which are more a matter of opinion than detail, are discussed here.
To understand why NNTP is often an inappropriate choice for newsfeeds, we need to understand TCP's sliding window protocol and the nature of NNTP. NNTP is an apalling waste of bandwidth for most bulk article transfer situations, because of the following simple reasons:
No compression: articles are transferred in plain text.
No article transmission restart: if a connection breaks halfway through an article, the next round will have to start with the beginning of the article.
Ping-pong protocol: NNTP is unsuitable for bulk streaming data transfer because the TCP sliding window feature is unusable with NNTP.
What is a ping-pong protocol? TCP uses a sliding window mechanism to pump out data in one direction very rapidly, and can achieve near wire speeds under most circumstances. However, this only works if the application layer protocol can aggregate a large amount of data and pump it out without having to stop every so often, waiting for an ack or a response from the other end's application layer. This is precisely why sending one file of 100 Mbytes by FTP takes so much less clock time than 10,000 files of 10 Kbytes each, all other parameters remaining unchanged. The trick is to keep the sliding window sliding smoothly over the outgoing data, blasting packets out as fast as the wire will carry it, without ever allowing the window to empty out while you wait for an ack. Protocols which require short bursts of data from either end constantly, e.g. in the case of remote procedure calls, are called ``ping pong protocols'' because they remind you of a table-tennis ball.
With NNTP, this is precisely the problem. The average size of Usenet news messages, including header and body, is 3 Kbytes. When thousands of such articles are sent out by NNTP, the sending server has to send the message ID of the first article, then wait for the receiving server to respond with a ``yes'' or ``no.'' Once the sending server gets the ``yes'', it sends out that article, and waits for an ``ok'' from the receiving server. Then it sends out the message ID of the second article, and waits for another ``yes'' or ``no.'' And so on. The TCP sliding window never gets to do its job.
This sub-optimal use of TCP's data pumping ability, coupled with the absence of compression, make for a protocol which is great for synchronous connectivity, e.g. for news reading or real-time updates, but very poor for batched transfer of data which can be delayed and pumped out. All these are precisely reversed in the case of UUCP over TCP.
To decide which protocol, UUCP over TCP or NNTP, is appropriate for your server, you must address two questions:
How much time can your server afford to wait from the time your upstream server receives an article to the time it passes it on to you?
Are you receiving the same set of hierarchies from multiple next-door neighbour servers, i.e. is your newsfeed flow pattern a mesh instead of a tree?
If your answers to the two questions above are ``messages cannot wait'' and ``we operate in a mesh'', then NNTP is the correct protocol for your server to receive its primary feed(s).
In most cases, carrier-class servers operated by major service providers do not want to accept even a minute's delay from the time they receive an article to the time they retransmit it out. They also operate in a mesh with other servers operated by their own organisations (e.g. for redundancy) or others. They usually sit very close to the Internet backbone, i.e. with Tier 1 ISPs, and have extremely fast Internet links, usually more than 10 Mbits/sec. The amount of data that flows out of such servers in outgoing feeds is more than the amount that comes in, because each incoming article is retained, not for local consumption, but for retransmission to others lower down in the flow. And these servers boast of a retransmission latency of less than 30 seconds, i.e. I will retransmit an article to you within 30 seconds of my having received it.
However, if your server is used by a company for making Usenet news available for its employees, or by an institute to make the service available for its students and teachers, then you are not operating your server in a mesh pattern, nor do you mind it if messages take a few hours to reach you from your upstream neighbour.
In that case, you have enormous bandwidth to conserve by moving to UUCP. Even if, in this Internet-dominated era, you have no one to supply you with a newsfeed using dialup point-to-point links, you can pick up a compressed batched newsfeed using UUCP over TCP, over the Internet.
In this context, we want to mention Taylor UUCP, an excellent UUCP implementation available under GNU GPL. We use this UUCP implementation in preference to the bundled UUCP systems offered by commercial Unix vendors even for dialup connections, because it is far more stable, high performance, and always supports file transfer restart. Over TCP/IP, Taylor is the only one we have tried, and we have no wish to try any others.
Apart from its robustness, Taylor UUCP has one invaluable feature critical to large Usenet batch transfers: file transfer restart. If it is transferring a 10 MB batch, and the connection breaks after 8 MB, it will restart precisely where it left off last time. Therefore, no bytes of bandwidth are wasted, and queues never get stuck forever.
Over NNTP, since there is no batching, transfers happen one article at a time. Considering the (relatively) small size of an article compared to multi-megabyte UUCP batches, one would expect that an article would never pose a major problem while being transported; if it can't be pushed across in one attempt, it'll surely be copied the next time. However, we have experienced entire NNTP feeds getting stuck for days on end because of one article, with logs showing the same article breaking the connection over and over again while being transferred [1]. Some rare articles can be more than a megabyte in size, particularly in comp.binaries. In each such incident, we have had to manually edit the queue file on the transmitting server and remove the offending article from the head of the queue. Taylor UUCP, on the other hand, has never given us a single hiccup with blocked queues.
We feel that the overwhelming majority of servers offering the Usenet news service are at the leaf nodes of the Usenet news flow, not at the heart. These servers are usually connected in a tree, with each server having one upstream ``parent node'', and multiple downstream ``child nodes.'' These servers receive their bulk incoming feed from their upstream server, and their users can tolerate a delay of a few hours for articles to move in and out. If your server is in this class, we feel you should consider using UUCP over TCP and transfer compressed batches. This will minimise bandwidth usage, and if you operate using dialup Internet connections, it will directly reduce your expenses.
A word about the link between mesh-patterned newsfeed flow and the need to use NNTP. If your server is receiving primary --- as against trickle --- feeds from multiple next-door neighbours, then you have to use NNTP to receive these feeds. The reason lies in the way UUCP batches are accepted. UUCP batches are received in their entirety into your server, and then they are uncompressed and processed. When the sending server is giving you the batch, it is not getting a chance to go through the batch article by article and ask your server whether you have or don't have each article. This way, if multiple servers give you large feeds for the same hierarchies, then you will be bound to receive multiple copies of each article if you go the UUCP way. All the gains of compressed batches will then be neutralised. NNTP's IHAVE and SENDME dialogue in effect permits precisely this double-check for each article, and thus you don't receive even a single article twice.
For Usenet servers which connect to the Internet periodically using dialup connections to fetch news, the UUCP option is especially important. Their primary incoming newsfeed cannot be pushed into them using queued NNTP feeds for reasons described in the above paragraph These hapless servers are usually forced to pull out their articles using a pull NNTP feed, which is often very slow. This may lead to long connect times, repeat attempts after every line break, and high Internet connection charges.
On the other hand, we have been using UUCP over TCP and gzip'd batches for more than five years now in a variety of sites. Even today, a full feed of all eight standard hierarchies, plus the full microsoft, gnu and netscape hierarchies, minus alt and comp.binaries, can comfortably be handled in just a few hours of connect time every night, dialing up to the Internet at 33.6 or 56 Kbits/sec. We believe that the proverbial `full feed' with all hierarchies including alt can be handled comfortably with a 24-hour link at 56 Kbits/sec, provided you forget about NNTP feeds. We usually get compression ratios of 4:1 using gzip -9 on our news batches, incidentally.
INN and CNews are the two most popular free software implementations of Usenet news. Of these two, we prefer CNews, primarily because we have been using it across a very large range of Unixen for more than one decade, starting from its earliest release --- the so-called ``Shellscript release'' --- and we have yet to see a need to change.[2]
We have seen INN, and we are not comfortable with a software implementation which puts in so much of functionality inside one executable. This reminds us of Windows NT, Netscape Communicator, and other complex and monolithic systems, which make us uncomfortable with their opaqueness. We feel that CNews' architecture, which comprises many small programs, intuitively fits into the Unix approach of building large and complex systems, where each piece can be understood, debugged, and if needed, replaced, individually.
Secondly, we seem to see the move towards INN accompanied by a move towards NNTP as a primary newsfeed mechanism. This is no fault of INN; we suspect it is a sort of cultural difference between INN users and CNews users. We find the issue of UUCP versus NNTP for batched newsfeeds a far more serious issue than the choice of CNews versus INN. We simply cannot agree with the idea that NNTP is an appropriate protocol for bulk Usenet feeds for most sites. Unfortunately, we seem to find that most sites which are more comfortable using INN seem to also prefer NNTP over UUCP, for reasons not clear to us.
Our comments should not be taken as expressing any reservation about INN's quality or robustness. Its popularity is testimony to its quality; it most certainly ``gets the job done'' as well as anything else. In addition, there are a large number of commercial Usenet news server implementations which have started with the INN code; we do not know of any which have started with the CNews code. The Netwinsite DNews system and the Cyclone Typhoon, we suspect, both are INN-spired.
We will recommend CNews and NNTPd over INN, because we are more comfortable with the CNews architecture for reasons given above, and we do not run carrier-class sites. We will continue to support, maintain and extend this software base, at least for Linux. And we see no reason for the overwhelming majority of Usenet sites to be forced to use anything else. Your viewpoints welcome.
Had we been setting up and managing carrier-class sites with their near-real-time throughput requirements, we would probably not have chosen CNews. And for those situations, our opinion of NNTP versus compressed UUCP has been discussed in Section 12.1>
Suck and Leafnode have their place in the range of options, where they appear to be attractive for novices who are intimidated by the ``full blown'' appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd even on Linux laptops. We suspect INN can be used this way too. We do not find these ``full blown'' implementations any more resource hungry than their simpler cousins. Therefore, other than administration and configuration familiarity, we don't see any other reason why even a solitary end-user will choose Leafnode or Suck over CNews+NNTPd. As always, contrary opinions invited.
[1] | This lack of a restart facility is something NNTP shares with its older cousin, SMTP, and we have often seen email messages getting stuck in a similar fashion over flaky data links. In many such networks which we manage for our clients, we have moved the inter-server mail transfer to Taylor UUCP, using UUCP over TCP. |
[2] | One of us did his first installation with with BNews, actually, at the IIT Mumbai. Then we rapidly moved from there to CNews Shellscript Release, then CNews Performance Release, CNews Cleanup Release, and our current release has fixed some bugs in the latest Cleanup Release. |
Hosting by: Hurra Communications Ltd.
Generated: 2007-01-26 17:58:25