HARLEY HAHN'S USENET CENTER
File Sharing Tutorial
Usenet was originally designed to enable users to share small text messages. Although Usenet technology has improved tremendously over the years, the basic principles have not changed: we are still limited to sharing small text messages.
This is why, when we want to share a binary file (or a collection of binary files), we must first encode the data as text. We then break the text into individual segments which become, in effect, small text messages. All of these segments are then uploaded, separately, to a Usenet server.
As we have discussed, whenever you share a large binary file, you will end up with hundreds, even thousands of segments. In our example, for instance, the AVI file we shared in the previous section of the tutorial required us to create and post (upload) no less than 1,895 segments.
At the end of the section in which we discussed yENC files and segments, I posed important two questions:
We are now ready to answer these questions.
File Sharing in the 1980s and 1990s
In the 1980s and 1990s, it was common for technically adept people to use Usenet to share software. Indeed, much of the early software that built the Internet was actually distributed via Usenet. However, the need for keeping track of a large number of segments was a big problem.
In those days, file sharing required a great deal of technical skill. For instance, to obtain a copy of a large binary file, you first had download all the segments manually, using a slow computer with an even slower dial-up Internet connection, without any help from the Web (which was still new at the time).
As the segments were downloading, you had to keep track of them to make sure that none were missing. Once all the segments were present and accounted for, you would use specialized tools to put together the pieces and then convert the data from text back into to binary. These tools were complicated, text- based programs that required you to master a variety of cryptic command-line options.
By the end of the 1990s, computers and Internet connections had become much faster, the Web had expanded enormously, and easy-to- use GUI-based (graphical user interface) tools were the norm. However, there was still no easy way for Usenet file sharers to keep track of all the segments need to upload and download binary files.
Introducing the NZB File
The situation changed significantly in 2001, when a group of British-based Usenet programmers established a Web site, newzbin.com, devoted to Usenet file sharing. As part of their effort, they saw a need for a simple, compact way to keep track of the large number of segments necessary for file sharing. To do so, they invented a new type of file, which they called an NZB file.
An NZB FILE — usually referred to, more simply as an NZB — is a small text file that contains a master list of all the segments needed to recreate one particular binary file. In other words, if you download all the segments listed in an NZB, you will have everything you need to recreate the binary file described by that NZB. Here is a quick example to show you how it works.
In previous sections of the tutorial, we discussed sharing a 640-megabyte video file named steal-this-film-2.avi. In order to do so, we created 56 different files, all of which needed to be shared. In order to post these 56 files, we used a posting program which created 1,895 segments, all of which were then uploaded as separate text files.
Among the 56 files were the 43 RAR (data) files and 10 PAR2 (error correction) files needed to re-create the original AVI file. In addition, there was an SFV (checksum) file, an NFO (information) file. Finally, there was an NZB file, named steal-this-film-2.avi, that was created by the posting program itself.
This particular NZB contains a list of all 1,895 segments, along with the name of the newsgroup to which they were posted. If someone wants a copy of our file, all he has to do is download the NZB and hand it off to his newsreader. The newsreader will then read the contents of the NZB and use the information to find the segments and download them. Once all 1,895 segments are downloaded, the newsreader will decode them to recover the 56 files. Finally, the newsreader will process RAR and PAR2 files to re-create a copy of the original AVI file.
Looking Inside an NZB File
In case you are curious, I'd like to take a moment with you to explore a sample NZB together. The NZB from our example, however, is huge, because there are so many segments. (In fact, that particular NZB contains 4,303 lines.) So, instead, we'll take a look at a much small, sample NZB. After discuss it, I'll show you the full NZB from our example.
At this point, you might wondering, do you really need to know what's inside an NZB file? The answer is, you don't. In fact, most Usenet users have never seen the inside of an NZB. Because NZB files are created by software and processed by software, you can simply let your programs do the work for you and ignore the details.
So, at this point, if you don't really care about technical details, please feel to skip the rest of the NZB discussion and jump directly to the next topic in the tutorial.
Understanding a Sample NZB File
Still with me? Good. Below is an example of a small NZB file. This NZB describes a two-file upload that I posted to Usenet. The two files are test.txt and test.par2. Take a quick look at the example, after which we will take it apart and discuss the pieces.
Note: The line numbers to the left of each line are not part of the file: I put them there to help with our discussion.
As you can see, NZB files contain highly technical information. This information is written using XML, a general-purpose system used to define and describe various types of information. (The name XML stands for "Extensible Markup Language".) In our case, XML is used to describe the information needed by a newsreader to retrieve a set of files that has been posted to Usenet. Let's go through the sample file, line by line, and I'll show you how it works.
The first two lines of an XML file contain standard information used by the PARSER, the program that reads and interprets the file. All Usenet programs that process NZB files must use of an XML parser.
1 <?xml version="1.0" encoding="utf-8" ?>
The first line specifies the version of XML and the encoding (that is, the character set) being used.
The second line of an XML file contains a DOCTYPE DECLARATION. This is the address of a document that describes the XML tags (see below) that will be used within this particular document.
Within XML, information is organized using TAGS, boundary elements that delimit the various parts of the content. The simplest tags consist of a name, enclosed by angled brackets, for example:
<groups> ... </groups>
Some tags contain information after the name, within the angled brackets, for example:
In an NZB file two tags <nzb> and </nzb> are used to mark the beginning and the end of the Usenet-related information. In our example, you can see this on lines 3 and 20:
3 <nzb xmlns="http://www.newzbin.com/DTD/2003/nzb">
(You will notice extra information on line 3. This is used to define a "namespace", an XML entity used to generate unique internal names. You can ignore it.)
The bulk of the information within an NZB file describes the various files that are to be downloaded. The information for a specific files is delimited by <file> and </file> tags. Our example In our example, you can see two sets these tags on lines 4 and 10, and on lines 12 and 19. Here is the first set of tags.
4 <file poster="email@example.com (Fester Bestertester)" date="1313789820 " subject="(Test) [from Fester Bestertester] - "test.txt" yEnc (1/1)">
The first part of line 4 line contains a value for file poster. This value is copied from the From: line in the header of the Usenet article. In particular, you see the email address and name of the person who posted the file. In this case, for privacy, you see a fake address and fake name:
The second part of line 4 contains a value for date, This value shows the time and date the article was posted.
The reason the time and date look so odd is that they are encoded as UNIXTIME, a standard developed for the Unix operating system back in the early 1970s. The value you see is simply the number of seconds since January 1, 1970. If you convert this particular value to regular time, you will see that it is equivalent to August 19, 2011, 4:37 PM.
Finally, the last part of line 4 shows a modified version of the Subject: line from the Usenet article, as generated automatically by the posting program. In this case, we can see that the file is test.txt and that it has been encoded using yEnc.
Within the <file> and </file> tags are two other pairs of tags:
Between the <groups> and </groups> tags, you will find a list of the newsgroups to which the file was posted. In our example, you can see this information on lines 5-7 and 13-15:
As you can see, this file was posted to alt.binaries.test. (As a general rule, it is a good idea to post files to only one group.)
Between the <segments> and </segments> tags, you will find a information about all of the segments that were posted for this particular file, each segment having a line of its own. In our example, you can see this information on lines 8-10 and 16-18:
Because this is a small file, it fits into a single segment, which is why there is only one <segment> line. Large files, which require many segments, will have an NZB file containing many <segment> lines. For instance, the example we have been using — sharing a large AVI (video) file — will require the posting of 1,895. This means the NZB file for this AVI file will contain 1,895 different <segment> lines. In fact, you will see this NZB file in the next section.
Each <segment> line has three pieces of information. First, you see the number of bytes (characters) in the segment. in our example above, the segment is 99 bytes long. Next, you see the number of segment. In our case, there is only one segment, so it is #1. If there were, say, 100 segments, they would be numbered 1 through 100.
Every article this is posted to Usenet has a unique MESSAGE-ID. The last piece of information in each segment line is the Message-ID for that particular segment. In this case, the Message-ID is:
Using this Message-ID and the name of the newsgroup to which the article (segment) is posted, it is a simple task for your newsreader (or another program) to download the segment.
To give you a bit more practice understanding NZB files, the next section contains a large NZB for you to skim through. Now that you understand how NZBs work, I think you'll find it interesting to look at a more typical example of the type of NZBs that are normally used to share large files.
Once you are finished looking at the large NZB, we'll continue with our original example. At that time, we will finish the process of sharing the large AVI file we discussed earlier in the tutorial.
© All contents Copyright 2014, Harley Hahn