Thomas Touhey
:
fopencookie - make your own filestreams!
If you didn’t know that already, what we commonly call libc
is a set of
basic functions that are very useful in C, allowing basic interactions with
the system and memory without any knowledge of them. It is implemented
on virtually any platform you can compile C on, so knowing functions you can
use in the libc
is quite important if you want something compatible and
re-usable for adaptations of your projects for other platforms.
There are several norms defining and extending the libc
. The basic definition
of it is in the C standards directly (C89, C99, C11), it’s quite basic functions
using memory (uncasted, strings, characters, …)
and STREAMS (which some don’t bother implementing on their platform:
you won’t find printf
on these). Instead of creating another library that
is specific to your system (liblinux
or any sexier name), people preferred
to extend the libc
standard for more control over their system; this is why
we have unistd.h
on UNIX-like systems (defined in POSIX/SUS) and windows.h
on Microsoft Windows systems (didn’t bother looking the standard it’s
defined in).
Then GNU came along, and while remaking UNIX stuff, they made their own
C compiler and library, and they ended up adding their own features above the
ones required for POSIX/SUS. Some of them are funny (strfry
, memfrob
),
but most of them are really useful in everyday life. A simple example? You
shouldn’t be able to make arithmetical operations on void*
pointers, so this
shouldn’t work:
1
2
3
4
char s[6];
void *p = s;
for (int i = 0; i < 6; i++)
*p++; /* oops! */
Readers that can use GCC can test this by compiling or not with the -pedantic
command-line option, that disables GNU extensions (read the manpage!).
So that’s one example of useful GNU extensions that were added to the C
compiler, but weren’t we talking about libc
? That’s right, and let’s
concentrate on the STREAMS implementation. You surely know about fopen
,
fclose
, fread
and fwrite
functions that use a FILE
handle. So this
creates a STREAMS object, but the FILE
implement useful things above it, like
stream buffering or a cache of some things (is the stream readable?
writable?). And this FILE
abstraction is advanced enough not to need to be
hooked to any STREAMS object. I think the official term for this is “custom
streams”.
So fopen
is one way to create a stream, and it’s defined in the C standard
(but I’m unsure about this). Another one is to use fmemopen
/open_memstream
(defined in POSIX since 2008), to create a stream on a memory buffer - useful
when you don’t want to manage limits and stuff yourself, and quite good for
security. The last one (the one this article is all supposed to be about) is
fopencookie
, which is a GNU extension. This functions allows you to use
your own callbacks for reading, writing, seeking and closing, with a cookie
(the internal streams data). Fun fact, fmemopen
uses fopencookie
in the GNU
implementation.
From now on, I advise you to open the (well made) manpage about fopencookie
,
I’ll just discuss on technical subtilities with a really simple example: we’ll
create a stream that is N-sized and returns 24-bytes. We’ll make a function
called create_tf_stream
and has this prototype:
1
FILE *create_tf_stream(size_t size);
So first of all: the stream and cookie creation. As you can see in the
fopencookie
prototype, there are three parameters: the cookie, the mode and
the callbacks. The cookie is passed through a void*
parameter: the function
will take the pointer and transmit it to the callbacks without reading it
directly. For our example, if our stream didn’t have any size, we wouldn’t
need any cookie (internal data), so we could just pass NULL
or any useless
thing that we won’t read in the callbacks. But because we have a size, we also
need a cursor. This is the structure we’ll use:
1
2
3
4
struct cookie {
size_t size;
size_t cursor;
};
Once the cookie type is defined, first question you should ask yourself is:
how will we create the cookie? Well, the most obvious way is to use the heap,
using malloc
- we’ll have to free
the cookie in the close
callback. But
I’m part of the people that don’t like allocating little bits of memory like
this, so if you’re only using one stream at a time (I mean the full cycle,
create, use, destroy), you could also use a static
variable; of course, this
is for very specific cases.
The next question about stream creation is: wait, why the heck do we need a
mode
string like for fopen
? Why doesn’t it just forbids reading when the
pointer to the read callback is NULL
? Well in fact, if the pointer to the
read callback is NULL
, it will just call the “default callback”, which returns
EOF (0) to each call. The mode string is really here to define permissions.
Next on the list, the callbacks. The prototypes look a lot like the fread
,
fwrite
, fseek
and fclose
functions, but they add a cookie
parameter
(that you’ll have to cast at the beginning of the function), and return an
ssize_t
instead of the size_t
returned by “normal” functions; this is
because in case of error, they will return the error directly. “Normal”
functions will fill errno
if the value returned by the callback is negative.
From this point, writing our custom filestream is quite easy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#define _GNU_SOURCE /* otherwise fopencookie is not declared */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(A, B) ((A) < (B) ? (A) : (B))
struct cookie {
size_t size;
size_t cursor;
};
static ssize_t tf_read(void *vcookie, char *buf, size_t size)
{
struct cookie *cookie = vcookie;
size_t toread = min(cookie->size - cookie->cursor, size);
memset(buf, 24, toread);
cookie->cursor += toread;
return (toread);
}
static int tf_seek(void *vcookie, off64_t *off, int whence)
{
struct cookie *cookie = vcookie;
size_t pos = 0;
/* get pos according to whence */
switch (whence) {
case SEEK_SET:
case SEEK_END:
if (*off < 0 || *off > cookie->size)
return (-1);
if (whence == SEEK_SET) pos = *off;
else pos = cookie->size - *off;
break;
case SEEK_CUR:
if (*off + cookie->cursor < 0)
return (-1);
pos = *off;
if (pos > cookie->size)
return (-1);
break;
}
/* set the position and return */
cookie->cursor = pos;
return (0);
}
static int tf_close(void *cookie)
{
free(cookie); /* take it now! */
}
FILE *tf_open(size_t size)
{
/* create the cookie */
struct cookie *cookie = malloc(sizeof(struct cookie));
if (!cookie) return (NULL);
*cookie = (struct cookie){
.cursor = 0,
.size = size
};
/* create the stream */
FILE *file = fopencookie(cookie, "r",
(cookie_io_functions_t){tf_read, NULL, tf_seek, tf_close});
if (!file) tf_close(cookie);
return (file);
}
And you can simply test by adding to the end of the file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <unistd.h>
int main(void)
{
FILE *f = tf_open(5);
if (!f) return (1);
char buf[2]; size_t read;
while ((read = fread(buf, 2, 1, f))) {
printf("%zu\n", read);
write(1, "/", 1);
write(1, buf, read);
write(1, "/\n", 2);
}
printf("%d\n", fseek(f, 120, SEEK_END));
printf("%zu\n", fread(buf, 1, 50, f));
}
But hey, turnaround, remember when I was talking about buffering? Well, in my
example, try printing the size your are given in the read
callback. It will
print it two times: on the first read, it prints 8096, on the third read
(when the internal buffer to the FILE
handle gets empty), it prints 0 (EOF).
You can tweak this value using setvbuf
(disabling buffering will just always
write when you fwrite
instead of writing everything you’ve fwrite
-d
before fread
-ing, more or less), but the behaviour will stay the same.
Also, if the buffer passed to fread
is not filled by your callback, it
will continue getting called until the buffer is full… or your callback
returns EOF. This caused me problems a while ago, I was making a communication
library and I was using fread(BUFFER_MAX)
. For this usage, prefer reading
gradually (read the header and content size, then read the size, using two
fread
-s), and don’t worry for the speed - these are optimized.
So yeah, as a conclusion, what I like with fopencookie
is that it’s like
writing a driver managing a char device, but for the program only and without
the kernelspace/userspace complexity. The filestreams abstraction is quite
powerful and quite cool to work with, it’s just sad that fopencookie
is
specific to the GNU C Library.
Wanna know more about glibc
streams? Head to
the GNU libc page about streams!