[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

8671.0. "Do aio signals work?" by APACHE::CHAMBERS () Fri Jan 31 1997 13:30

I've been doing some experimenting to test whether the aio_* routines can
give us better performance. The major question seems to be "How does a program
learn that an operation has finished?"  Now, this can be done by looping thru
the buffers and passing each to aio_error and then aio_return, but this is
clearly not a high-performance way to do it.  The obvious way is to use the
signals that are clearly intended to be part of an aiocb.  I coded up several
ways of doing it, using signal(), sigvec(), and sigaction().  All of them
utterly failed to call the signal handler.  

I can easily "prove" that the signal() code, for example, works:  I just have
it set up a signal handler for, say, SIGSEGV, write a loop that calls rand()
to generate pointer values, dereference then, and the fprintf(stderr,...) in
the signal handler produces output.  So I do know how to code a call of signal()
correctly.  But the same code setting up a SIGIO handler and putting its address
into an aiocb has no effect whatsoever.

Are there any examples of code that does this successfully?  (Alta Vista doesn't
seem to be able to find any.)

Here are some excerpts from one of my simpler tests, the one using signal().
Near the beginning of main() are the lines:

struct aiocb *buf = 0;
	...
   (void)signal(SIGIO,sigio);	/* Tell kernel about our signal handler */
	...
   n = buffers * sizeof(struct aiocb);
   if (!(buf = (struct aiocb*)malloc(n))) {	/* Get aiocb buffers */
      fprintf(stderr,...);
      exit(errno);
   }
   for (b=0; b<buffers; b++) {		/* Set up aio buffers */
      if (!(buf[b].aio_buf = (char*)malloc(bufsize))) {
         fprintf(stderr,...);
         exit(errno);
      }
      buf[b].aio_fildes = 1;
      buf[b].aio_reqprio = AIO_PRIO_DFL;
      buf[b].aio_sigevent.sigev_value.sival_int = b;
      buf[b].aio_sigevent.sigev_signo = SIGIO;
      buf[b].aio_sigevent.sigev_notify = 1;
   }

Later on, the program duly calls aio_write(&buf[b]) for various values of b.
The sigio routine contains a fprintf(stderr,"...") call which never happens.
Further code that calls aio_error() and aio_return() verifies that the write
operations have indeed finished, and the data does appear on stdout.  But no
calls of sigio ever happen.

As I said, this is merely one example; others use what should be equivalent
sigaction() and sigvec() calls, and they also never receive signals.

Either I'm doing something subtly wrong (likely) or aio on these machines 
(mostly 3.2F and 3.2G) don't actually do signals with aio_write.  I hope
that I'm doing it wrong.  Using a for loop and two system calls to determine
that aio_write has completed is clearly not desirable if you want speed.

Any clues?  Examples?

T.R	Title	User	Personal Name	Date	Lines
8671.1	RTFMP?	WTFN::SCALES	Despair is appropriate and inevitable.	`Fri Jan 31 1997 14:51`	13
	.0> Any clues? Examples? I'd guess that you're missing the sigev_notify setting (from the man page for aio_write(3)): > If aio_sigevent.sigev_notify equals > SIGEV_SIGNAL and aio_sigevent.sigev_signo is non-zero, a signal will be > generated when the asynchronous read operation has completed. Webb
8671.2	Your FM page is different from mine ...	APACHE::CHAMBERS		`Fri Jan 31 1997 15:53`	15
	Hmmm ... I checked with "man 3 aio_write" on several machines hereabouts, and none of them contains that text. In fact, none contains the characters string "notify" anywhere in any capitalization. There seems to be no mention whatsoever in any aio_* or sig* man page of the possible values for the sigev_notify field. The obvious guess was that it's a true/false value. But that might be a clue. I found SIGEV_SIGNAL in /usr/sys/include/sys/signal.h and also found the rather curious definitions: #define SIGEV_SIGNAL (0) /* Notify via signal / #define SIGEV_NONE (1) / Other notification: unsupported */ This seems to be saying that 0 is used for "true" and 1 for "false". Did they actually do something this perverse without any warning in a man page? Wouldn't be surprised. I'll give it a try ...
8671.3	Nope; SIGEV_SIGNAL ain't the answer	APACHE::CHAMBERS		`Fri Jan 31 1997 16:03`	14
	Well, I tried it, and it didn't change a thing. Now the lines that fill in the aiocb fields look like: buf[b].aio_fildes = 1; buf[b].aio_reqprio = AIO_PRIO_DFL; buf[b].aio_sigevent.sigev_value.sival_int = b; buf[b].aio_sigevent.sigev_signo = SIGIO; buf[b].aio_sigevent.sigev_notify = SIGEV_SIGNAL; This is accompanied by a call of (void)signal(SIGIO,sigio); somewhat earlier, and sigio contains a fprintf(stderr,...) to tell me that it was called. The message never appears. As before, aio_error and aio_return verify that the aio_write calls did complete, but no signal is ever generated.
8671.4		APACHE::GOLIKERI		`Fri Jan 31 1997 18:41`	6
	We did try (I work with John from -1) it by incrementing a global counter in the signal handler and took out the fprintf's. It worked. So it seems the signal handler works but fprintfs in the signal handler mess things up or do not work. Shaila
8671.5	But aio_sigevent doesn't seem to be available ...	APACHE::CHAMBERS		`Fri Jan 31 1997 19:30`	29
	Yup; as Shaila said, bumping a counter and doing nothing else seems to prove that the signal handler is indeed called. However, if this all that's doable, the signal handler is rather pointless. The reason for wanting a signal handler is so that the rest of the code can be told "Hey, aiocb b is done and can be reused." This is obviously the point of the aio_sigevent field, and it has a aio_sigevent.sigev_value.sival_int field that is obviously the place to pass the aiocb index b to the signal handler. But the trail grows cold here. I ran find /usr/man -type f -print \| grep sigevent to find out everything documented about sigevents. It gave matches on the aio_* pages, and on timer_create and mq_notify. But none of these give any hint as to how the information in the aio_sigevent might get passed to the signal handler. The string "sigevent" doesn't occur in any of the man pages related to signals. This means that the signal handler has no (documented) way of learning which aiocb triggered the signal. The only apparent way for the signal handler to discover which aiocb it was called for is to loop thru all of them and test each one for completion. This makes the entire signal mechanism moot, since it means that the code we're trying to eliminate (a for loop calling aio_error for each aiocb) must now be done inside the signal handler, and we haven't saved any time at all; we've just added the complexity and overhead of signals without any benefit. Is there some way that the signal handler can learn which aio_sigevent (or aiocb) was the one that triggered the signal? Is it documented somewhere? (Even better, is there an example somewhere that does it right?)
8671.6		SMURF::DENHAM	Digital UNIX Kernel	`Sat Feb 01 1997 01:22`	48
	Check out the Guide to Realtime Programming. I know you only like man pages, but that's where the information currently resides. The signal information (SIGINFO) support was added in V3.2 but the doc wasn't updated. The information on how to use the feature went into the release notes thence to the realtime guide. More should go into the man pages. I'll make sure that happens. Now, as to how to get the info you put in the sival field in the sigevent structure out of the signal handler: #include <sys/siginfo.h> #include <sys/signal.h> Set up your handler to use siginfo: sig_act.sa_handler = (void ) sig_handler; sigemptyset(&sig_act.sa_mask); sig_act.sa_flags = SA_SIGINFO; status = sigaction(SIGIO, &sig_act, NULL); Put something interesting in the sigevent sival field: sigevent.sigevent.sigev_value.sival_ptr = &acb; Then in the handler, something like this: void sig_handler(int signo, siginfo_t sip, void useless) { struct aiocb ap; ap = (struct aiocb *) sip->si_value.sival_ptr; if (aio_error(ap) == EINPROGRESS) { abort(); } if (aio_return(ap)) { perror("aio_return"); exit(1); } } Anyway, that's the general idea of how to use it. The siginfo concept comes straight from SVR4. The value field was just a little tweak added by POSIX realtime.
8671.7		APACHE::CHAMBERS		`Mon Feb 03 1997 13:16`	35
	Thanks for the example. It looks like it might answer the question. > Check out the Guide to Realtime Programming. I know you only > like man pages, but that's where the information currently > resides. The signal information (SIGINFO) support was added > in V3.2 but the doc wasn't updated. Well, now; I'm not sure that I'd agree that I "only like man pages". In fact, I'm not sure I'd agree that I usually like them much at all. But if you're rloginned to a couple of different machines with different release levels of software, and you're trying to figure out how to get some code working on those machines, the man pages are often the only documentation that's available for those machines. I don't seem to see any pointers anywhere to the Guide to Realtime Programming; this is the first time I've heard of it. There doesn't seem to be a copy on any nearby shelf. Is it available online, on the Web maybe? Should the man pages perhaps have pointers to it? Also, as I mentioned earlier, I did ask Alta Vista if it knew anything about aio_write and signal and a few other likely keywords. It did; it showed me Web versions of the same things that `man whatever` shows me. No sample code anywhere. Even using keywords like "sample" and "example" didn't turn up anything that was recognizable as sample code or C examples. The Web versions of the signal-related pages didn't explain how a signal handler can learn which aiocb caused the signal. When it works, Webified documentation can be much more findable, browsable, etc.able than the Unix man pages. When it works. It didn't in this case, though I spent much more time with it than with "man". Perhaps this only shows that it's much more difficult to recognize failure on the Web than with "man". Maybe I just couldn't guess the right keywords. I don't think there were any clues that the phrase "realtime programming" was related to the topic. Let's see how the sample signal code works ...
8671.8	r-t guide is on web (w/ rest of v4.0 docs)	SMURF::KHALL		`Mon Feb 03 1997 16:06`	7
	Guide to Realtime Programming is on the web at http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html as part of the v4.0 docset. \ken
8671.9	"Digital Internal Use Only" PS file	HELIX::CLARK		`Mon Feb 03 1997 16:51`	18
	> Guide to Realtime Programming is on the web at > > http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html > > as part of the v4.0 docset. Unfortunately, when you click on the above (within the Programming Documentation shelf), you find it's not yet available in HTML. We plan to revise the book and migrate it to HTML -- current estimate is the May/June time frame. Same for the corresponding manpages, which include aio*. [Suggestions for improving the guide & manpages content gratefully accepted.] In the meantime, a PS file for the latest edition (V4.0) is available at: HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS - Jay
8671.10		APACHE::CHAMBERS		`Mon Feb 03 1997 20:23`	35
	\| > Guide to Realtime Programming is on the web at \| > http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html \| > as part of the v4.0 docset. \| \| Unfortunately, when you click on the above (within the Programming \| Documentation shelf), you find it's not yet available in HTML. Sure 'nuf. In fact, when I paste that into netscape's GoTo widget (Don't you love how they thumb their noses at the anti-GoTo crowd? ;-), what I get is a looooong pause, and the little popup telling me that "A network error has occurred ... Try connecting again later." Several tries got the same non-response. \| In the meantime, a PS file for the latest edition (V4.0) is available at: \| HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS I wonder how one might get there from a Unix workstation? I tried several combos of ftp, rcp, and URLs with netscape, and got lots of complaints about my syntax. Of course, there's also the observation that we have only a few 4.0 machines hereabouts; most are 3.2somethings. And the code should be as portable as possible, of course. Meanwhile, back at the ranch, with the help of the sigevent example I got a test program running that successfully copies data from stdin to stdout using a flock of aiocbs. It works for 1-MB, 5_MB and 100-MB files. Its run time is exactly the same as that of a tiny program that is the trivial read()/write() loop. Both use buffers of the same size, 40K. The times never differed by more than 1 sec (out of 45) for the 100-MB case. Not much saving there, I'd say. Where would one learn how to make such things run faster? It's not at all obvious how one might learn where an aio-style program is wasting its time. If it doesn't run faster, well, why use a big, complex, nondeterministic program when a simple, couple-line, deterministic program does the same job at the same speed? (Other than for the challenge, of course. ;-)
8671.11		HELIX::SONTAKKE		`Mon Feb 03 1997 20:35`	7
	Why would you believe that asyncronous I/O will be faster? All it allows you to do is to do something else while I/O is going on. You are not blocked during the I/O. You could do double buffering and run some computation on one set of data while another is being read/written asyncronously. - Vikas
8671.12	ftp location for Guide to Realtime Prog'g	HELIX::CLARK		`Mon Feb 03 1997 21:10`	16
	\| In the meantime, a PS file for the latest edition (V4.0) is available at: \| HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS OK, the PS file for Guide to Realtime Programming should now be available via anonymous ftp. From: osfrt.shr.dec.com: ./pub/osf/decosf_v40_rt_prog_guide.ps (The notation means, cd to pub/osf before you get the file.) At one point the FTP server on osfrt was not working when accessed as URL ftp://osfrt.shr.dec.com/. If this is a showstopper for anyone, I can try to provide a working URL on an internal WWW server. These are temporary Digital-internal homes -- all this will cease to be necessary once the manual is migrated to HTML. - Jay
8671.13		XIRTLU::schott	Eric R. Schott USG Product Management	`Tue Feb 04 1997 11:12`	14
	Hi You can access DECnet files via your web browser See http://www-unix.zk3.dec.com/www/dgwy.html or http://www-unix.zk3.dec.com/cgi-bin/ersdec?DECNET::FILENAME
8671.14		APACHE::CHAMBERS		`Tue Feb 04 1997 13:06`	60
	\| Why would you believe that asyncronous I/O will be faster? I don't. I'm working on a project that needs to get the fastest file-copy time possible over various kinds of networks. The folks here are trying to move gigabyte-size video files around the network. I've been trying out various ideas to see if they can get us closer to the claimed throughput capabilities of the hardware. Someone suggested that async I/O could give us better speed than the plain read()/write() loop that is our winner so far. It's worth testing. The problem is figuring out how to do it right, given the sparcity of the documentation. ("Guide to Real Time Programming? I think Mike has a copy; let's go see ... Nope; I'm sure I saw a copy around here; maybe if we look on Jim's shelves ... No, not there either ... Have you tried asking in the notes file? ... How about on the Web? ... ") \| All it \| allows you to do is to do something else while I/O is going on. You \| are not blocked during the I/O. You could do double buffering and run \| some computation on one set of data while another is being read/written \| asyncronously. And of course that's exactly why you'd expect that it might be slightly faster. But then, it might not be. It's entirely possible that the kernel's buffering (and the buffering in the network) gives you all the overlap that is possible when you're just trying to move data, and async I/O won't gain you anything beyond that. It's worth testing (if you can get the info that it takes to do it right). Actually, there's another reason to expect that aio_write() might be faster than write(). The write() system call has to copy the data from the user's buffer into a kernel buffer, and then to disk or the network or wherever. The aio_write() routine (in theory) need not do this copy, but can use the data in the user's buffer. This eliminates a copy, which should be a time saving. Of course, it might not be. One reason is that disks require that data be written in sector-size chunks, so if the process isn't writing data in multiples of a sector, copying will be needed anyway. Similarly, writing to a TCP or UDP socket entails adding packet headers, and this certain can't be done in the bytes just before the process's buffer, so again copying might be necessary. So we have another "maybe it'll be faster; maybe it won't" situation. "Let's code it up and run some timing tests ...." Several other ideas have been tried; not all worked. Thus, I spent a month or so testing out the claim that "threads is the answer". I did get a big, complicated threaded file copy working, after much grief, and in the end, it failed (i.e., spent long milliseconds sleeping when it should have been reading data) in exactly the same way that the older select() and poll() based code did. But the explanation of why it should work sounded reasonable, and it was deemed worth testing. It's entirely possible that both the threaded-code test and my more recent aio-code test are "not written quite right", and could perform better. That's why I'm asking questions here. Asynchronous, "real-time" code can be tricky, and it can shoot itself in the foot in many subtle ways, not all of which are obvious from reading the manuals (especially when the manuals lack examples). (I do find it interesting to reflect on the fact that I've written a lot of multi-tasking code in tcl in recent years, and I've "never had any trouble with it". I just write it, and it does what I expect. In particular, I use asynchronous I/O in tcl as the "preferred" technique, because dealing with a fileevent is so straightforward. I wonder if there might be some sort of lesson lurking here? Naaah ... ;-)
8671.15		APACHE::GOLIKERI		`Tue Feb 04 1997 13:10`	11
	prev . faster or not? Vikas, you are right, in that aio in itself does not mean improved performance. But from the application point of view, it can as you said do double (or any number) buffering and then do something else than wait on a completion of a write or read. The asynchronicity (sp?) is for the application more than layers below. Shaila
8671.16		HELIX::CLARK		`Tue Feb 04 1997 13:31`	12
	To answer for the documentation end of things: Since we plan to revise the Guide to Realtime Programming and associated manpages (see % apropos .1b) in the near future, we can address items such as: - look to improve aio & signals examples - identify "realtime" & library associations of manpages and point to examples - consider a sigevent(4) manpage - ??? suggestions welcome Jay
8671.17	write() doesn't require copy	WASTED::map	Mark Parenti, Unix Engineering Group	`Tue Feb 04 1997 15:03`	24
	> Actually, there's another reason to expect that aio_write() might be faster > than write(). The write() system call has to copy the data from the user's > buffer into a kernel buffer, and then to disk or the network or wherever. > The aio_write() routine (in theory) need not do this copy, but can use the > data in the user's buffer. This eliminates a copy, which should be a time > saving. Of course, it might not be. One reason is that disks require that > data be written in sector-size chunks, so if the process isn't writing data > in multiples of a sector, copying will be needed anyway. Similarly, writing > to a TCP or UDP socket entails adding packet headers, and this certain can't > be done in the bytes just before the process's buffer, so again copying might > be necessary. So we have another "maybe it'll be faster; maybe it won't" > situation. "Let's code it up and run some timing tests ...." It isn't true that using write() requires a copy. It depends on the driver type. If you are using raw disks then write() can do the I/O directly from a user buffer. Since aio_write() simply goes through the drivers write() routine I would expect it would also be true for that interface. For network devices, however, I believe that all I/O, including aio, requires a copy. There are a couple of projects underway to look at allowing direct writes from user space buffers it doesn't work that way now. This is due to the way packets are passed down the network protocol stack, I believe. Mark Parenti UEG
8671.18		VAXCPU::michaud	Jeff Michaud - ObjectBroker	`Tue Feb 04 1997 15:24`	17
	> There are a couple of projects underway to look at allowing direct > writes from user space buffers it doesn't work that way now. This is due to > the way packets are passed down the network protocol stack, I believe. Also I believe that historically the kernel had no way to keep the users buffer locked in memory after the syscall returned, hence the address may not be valid when subsequently the driver accessed the buffer. Plus the network model is that when a write/send returns, the user then owns the buffer again, and is free to reuse it, free it, etc. If the kernel/driver were to not copy the user's buffer, then either the write/send would have to block until the user data made it to the remote system and was acked, or the interface would have to be modified to support some form of status block, call back, and/or other means so that the user can determine when the buffer is free to be used again or freed.
8671.19	Web location for Guide to Realtime Prog'g	HELIX::CLARK		`Thu Apr 10 1997 14:51`	23
	As mentioned earlier in the string, an HTML version of the "Guide to Realtime Programming" will be available on the Web in a couple of months. In the meantime, a PDF (Adobe Acrobat) version is available and customer-accessible on the web, within DIGITAL's OEM InfoCenter: http://www.digital.com/oem/library/docs/docs.htm#rtunixdocs See the top of the page, as necessary, for a hot spot that downloads the Adobe Acrobart Reader for viewing. The PS file for the manual remains internally accessible via ftp and DECnet. > ftp: > osfrt.shr.dec.com: ./pub/osf/decosf_v40_rt_prog_guide.ps > (The notation means, cd to pub/osf before you get the file.) > DECnet: > HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS All the above will cease to be necessary once the manual is migrated to HTML. - Jay
8671.20		HELIX::SONTAKKE		`Fri Apr 11 1997 15:30`	6
	By the way, for some bizarre reason you can not use Netscape and provide ftp://osfrt.shr.dec.com/ as URL. Nobody knows the reason why it does not function under Netscape. The same URL will work with Lynx2-5 - Vikas