Tar: Difference between revisions

72 bytes added ,  29 days ago
m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
m (remove self-promotional link, Undo revision 28569 by Puffer (talk))
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
Line 7:
Tar works by concatenating all files together in one big file where each file gets a header that is padded up to 512 bytes. The headers are each aligned on a 512 byte boundry and the content follows directly afterwards. This is what a tar header in it's simplest form could look like.
 
<sourcesyntaxhighlight lang="c">
struct tar_header
{
Line 19:
char typeflag[1];
};
</syntaxhighlight>
</source>
 
As you can see the header is not 512 bytes big so the rest of the header is padded up with zeros. At the end of the tar file there is a header of 512 bytes filled with only zeros to indicate the end. (There may be more empty headers to pad to the tape block size; ustar format specifies 10KiB blocks.) A header contains a lot of information but for this tutorial we'll only be interested in two of them.
Line 27:
Size: This contains the size of the file. Important to note here is that the size is in base 8 and written using ascii characters so when you try to figure out the size you must keep this in mind. Actually all fields containing numbers uses this not just size. This is a code snippet that calculates this for you:
 
<sourcesyntaxhighlight lang="c">
unsigned int getsize(const char *in)
{
Line 41:
 
}
</syntaxhighlight>
</source>
 
To parse a tar file is not very hard. Start by creating an array big enough to hold all the files your ramdisk will have. Each entry should contain a pointer to structs of type struct tar_header like this:
 
<sourcesyntaxhighlight lang="c">
struct tar_header *headers[32];
</syntaxhighlight>
</source>
 
A better way would of course be to allocate this dynamically by using malloc() but for the rest of the tutorial I assume you've done like I did.
Line 53:
Starting from the address of the first header loop through them all by reading the size of the current header and the next header should be located at the nearest boundry after the current header's address + 512 + the file size, rounded up to the next 512-byte boundary. Stop when you hit a header that is zero. Here is a simple example that does this and returns the number of files found:
 
<sourcesyntaxhighlight lang="c">
unsigned int parse(unsigned int address)
{
Line 81:
 
}
</syntaxhighlight>
</source>
 
Now you have all the headers in your array. To get the actual content you only need to get the value of the pointer and add 512.