fopencookie - make your own file streams!

If you didn't know that already, what we commonly call libc is a set of basic functions that are very useful in C, allowing basic interactions with the system and memory without any knowledge of them. It is implemented on virtually any platform you can compile C on, so knowing functions you can use in the libc is quite important if you want something compatible and re-usable for adaptations of your projects for other platforms.

There are several norms defining and extending the libc. The basic definition of it is in the C standards directly (C89, C99, C11), it's quite basic functions using memory (uncasted, strings, characters, ...) and STREAMS (which some don't bother implementing on their platform: you won't find printf() on these). Instead of creating another library that is specific to your system (liblinux or any sexier name), people preferred to extend the libc standard for more control over their system; this is why we have unistd.h on UNIX-like systems (defined in POSIX/SUS) and windows.h on Microsoft Windows systems (didn't bother looking the standard it's defined in).

Then GNU came along, and while remaking UNIX stuff, they made their own C compiler and library, and they ended up adding their own features above the ones required for POSIX/SUS. Some of them are funny (strfry(), memfrob()), but most of them are really useful in everyday life. A simple example? You shouldn't be able to make arithmetical operations on void* pointers, so this shouldn't work:

char s[6];
void *p = s;
for (int i = 0; i < 6; i++)
    *p++; /* oops! */

Readers that can use GCC can test this by compiling or not with the -pedantic command-line option, that disables GNU extensions (read the manpage!).

So that's one example of useful GNU extensions that were added to the C compiler, but weren't we talking about libc? That's right, and let's concentrate on the STREAMS implementation. You surely know about fopen(), fclose(), fread() and fwrite() functions that use a FILE handle. So this creates a STREAMS object, but the FILE implement useful things above it, like stream buffering or a cache of some things (is the stream readable? writable?). And this FILE abstraction is advanced enough not to need to be hooked to any STREAMS object. I think the official term for this is "custom streams".

So fopen() is one way to create a stream, and it's defined in the C standard (but I'm unsure about this). Another one is to use fmemopen() / open_memstream() (defined in POSIX since 2008), to create a stream on a memory buffer - useful when you don't want to manage limits and stuff yourself, and quite good for security. The last one (the one this article is all supposed to be about) is fopencookie(), which is a GNU extension. This functions allows you to use your own callbacks for reading, writing, seeking and closing, with a cookie (the internal streams data). Fun fact, fmemopen() uses fopencookie() in the GNU implementation.

From now on, I advise you to open the (well made) manpage about fopencookie(), I'll just discuss on technical subtilities with a really simple example: we'll create a stream that is N-sized and returns 24-bytes. We'll make a function called create_tf_stream() and has this prototype:

FILE *create_tf_stream(size_t size);

So first of all: the stream and cookie creation. As you can see in the fopencookie() prototype, there are three parameters: the cookie, the mode and the callbacks. The cookie is passed through a void * parameter: the function will take the pointer and transmit it to the callbacks without reading it directly. For our example, if our stream didn't have any size, we wouldn't need any cookie (internal data), so we could just pass NULL or any useless thing that we won't read in the callbacks. But because we have a size, we also need a cursor. This is the structure we'll use:

struct cookie {
    size_t size;
    size_t cursor;
};

Once the cookie type is defined, first question you should ask yourself is: how will we create the cookie? Well, the most obvious way is to use the heap, using malloc() - we'll have to free() the cookie in the close() callback. But I'm part of the people that don't like allocating little bits of memory like this, so if you're only using one stream at a time (I mean the full cycle, create, use, destroy), you could also use a static variable; of course, this is for very specific cases.

The next question about stream creation is: wait, why the heck do we need a mode string like for fopen()? Why doesn't it just forbids reading when the pointer to the read callback is NULL? Well in fact, if the pointer to the read callback is NULL, it will just call the "default callback", which returns EOF (0) to each call. The mode string is really here to define permissions.

Next on the list, the callbacks. The prototypes look a lot like the fread(), fwrite(), fseek() and fclose() functions, but they add a cookie parameter (that you'll have to cast at the beginning of the function), and return an ssize_t instead of the size_t returned by "normal" functions; this is because in case of error, they will return the error directly. "Normal" functions will fill errno if the value returned by the callback is negative.

From this point, writing our custom filestream is quite easy:

#define _GNU_SOURCE /* otherwise fopencookie is not declared */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(A, B) ((A) < (B) ? (A) : (B))

struct cookie {
    size_t size;
    size_t cursor;
};

static ssize_t tf_read(void *vcookie, char *buf, size_t size)
{
    struct cookie *cookie = vcookie;
    size_t toread = min(cookie->size - cookie->cursor, size);

    memset(buf, 24, toread);
    cookie->cursor += toread;
    return (toread);
}

static int tf_seek(void *vcookie, off64_t *off, int whence)
{
    struct cookie *cookie = vcookie;
    size_t pos = 0;

    /* get pos according to whence */
    switch (whence) {
    case SEEK_SET:
    case SEEK_END:
        if (*off < 0 || *off > cookie->size)
            return (-1);
        if (whence == SEEK_SET) pos = *off;
        else pos = cookie->size - *off;
        break;

    case SEEK_CUR:
        if (*off + cookie->cursor < 0)
            return (-1);
        pos = *off;
        if (pos > cookie->size)
            return (-1);
        break;
    }

    /* set the position and return */
    cookie->cursor = pos;
    return (0);
}

static int tf_close(void *cookie)
{
    free(cookie); /* take it now! */
}

FILE *tf_open(size_t size)
{
    /* create the cookie */
    struct cookie *cookie = malloc(sizeof(struct cookie));
    if (!cookie) return (NULL);
    *cookie = (struct cookie){
        .cursor = 0,
        .size = size
    };

    /* create the stream */
    FILE *file = fopencookie(cookie, "r",
        (cookie_io_functions_t){tf_read, NULL, tf_seek, tf_close});
    if (!file) tf_close(cookie);
    return (file);
}

And you can simply test by adding to the end of the file:

#include <unistd.h>

int main(void)
{
    FILE *f = tf_open(5);
    if (!f) return (1);

    char buf[2]; size_t read;
    while ((read = fread(buf, 2, 1, f))) {
        printf("%zu\n", read);
        write(1, "/", 1);
        write(1, buf, read);
        write(1, "/\n", 2);
    }
    printf("%d\n", fseek(f, 120, SEEK_END));
    printf("%zu\n", fread(buf, 1, 50, f));
}

But hey, turnaround, remember when I was talking about buffering? Well, in my example, try printing the size your are given in the read callback. It will print it two times: on the first read, it prints 8096, on the third read (when the internal buffer to the FILE handle gets empty), it prints 0 (EOF). You can tweak this value using setvbuf() (disabling buffering will just always write when you fwrite() instead of writing everything you've fwrite()-d before fread()-ing, more or less), but the behaviour will stay the same.

Also, if the buffer passed to fread() is not filled by your callback, it will continue getting called until the buffer is full... or your callback returns EOF. This caused me problems a while ago, I was making a communication library and I was using fread(BUFFER_MAX). For this usage, prefer reading gradually (read the header and content size, then read the size, using two fread()-s), and don't worry for the speed - these are optimized.

So yeah, as a conclusion, what I like with fopencookie() is that it's like writing a driver managing a char device, but for the program only and without the kernelspace/userspace complexity. The filestreams abstraction is quite powerful and quite cool to work with, it's just sad that fopencookie() is specific to the GNU C Library.

Wanna know more about glibc streams? Head to the GNU libc page about streams!