[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8671.0. "Do aio signals work?" by APACHE::CHAMBERS () Fri Jan 31 1997 13:30

I've been doing some experimenting to test whether the aio_* routines can
give us better performance. The major question seems to be "How does a program
learn that an operation has finished?"  Now, this can be done by looping thru
the buffers and passing each to aio_error and then aio_return, but this is
clearly not a high-performance way to do it.  The obvious way is to use the
signals that are clearly intended to be part of an aiocb.  I coded up several
ways of doing it, using signal(), sigvec(), and sigaction().  All of them
utterly failed to call the signal handler.  

I can easily "prove" that the signal() code, for example, works:  I just have
it set up a signal handler for, say, SIGSEGV, write a loop that calls rand()
to generate pointer values, dereference then, and the fprintf(stderr,...) in
the signal handler produces output.  So I do know how to code a call of signal()
correctly.  But the same code setting up a SIGIO handler and putting its address
into an aiocb has no effect whatsoever.

Are there any examples of code that does this successfully?  (Alta Vista doesn't
seem to be able to find any.)

Here are some excerpts from one of my simpler tests, the one using signal().
Near the beginning of main() are the lines:

struct aiocb *buf = 0;
	...
   (void)signal(SIGIO,sigio);	/* Tell kernel about our signal handler */
	...
   n = buffers * sizeof(struct aiocb);
   if (!(buf = (struct aiocb*)malloc(n))) {	/* Get aiocb buffers */
      fprintf(stderr,...);
      exit(errno);
   }
   for (b=0; b<buffers; b++) {		/* Set up aio buffers */
      if (!(buf[b].aio_buf = (char*)malloc(bufsize))) {
         fprintf(stderr,...);
         exit(errno);
      }
      buf[b].aio_fildes = 1;
      buf[b].aio_reqprio = AIO_PRIO_DFL;
      buf[b].aio_sigevent.sigev_value.sival_int = b;
      buf[b].aio_sigevent.sigev_signo = SIGIO;
      buf[b].aio_sigevent.sigev_notify = 1;
   }

Later on, the program duly calls aio_write(&buf[b]) for various values of b.
The sigio routine contains a fprintf(stderr,"...") call which never happens.
Further code that calls aio_error() and aio_return() verifies that the write
operations have indeed finished, and the data does appear on stdout.  But no
calls of sigio ever happen.

As I said, this is merely one example; others use what should be equivalent
sigaction() and sigvec() calls, and they also never receive signals.

Either I'm doing something subtly wrong (likely) or aio on these machines 
(mostly 3.2F and 3.2G) don't actually do signals with aio_write.  I hope
that I'm doing it wrong.  Using a for loop and two system calls to determine
that aio_write has completed is clearly not desirable if you want speed.

Any clues?  Examples?

T.RTitleUserPersonal
Name
DateLines
8671.1RTFMP?WTFN::SCALESDespair is appropriate and inevitable.Fri Jan 31 1997 14:5113
.0> Any clues?  Examples?

I'd guess that you're missing the sigev_notify setting (from the man page for
aio_write(3)):

> If aio_sigevent.sigev_notify equals
> SIGEV_SIGNAL and aio_sigevent.sigev_signo is non-zero, a signal will be
> generated when the asynchronous read operation has completed.



				Webb

8671.2Your FM page is different from mine ...APACHE::CHAMBERSFri Jan 31 1997 15:5315
Hmmm ... I checked with "man 3 aio_write" on several machines hereabouts,
and none of them contains that text.  In fact, none contains the characters
string "notify" anywhere in any capitalization.  There seems to be no mention
whatsoever in any aio_* or sig* man page of the possible values for the
sigev_notify field.  The obvious guess was that it's a true/false value.

But that might be a clue. I found SIGEV_SIGNAL in /usr/sys/include/sys/signal.h
and also found the rather curious definitions:

#define SIGEV_SIGNAL (0)    /* Notify via signal */
#define SIGEV_NONE   (1)    /* Other notification: unsupported */

This seems to be saying that 0 is used for "true" and 1 for "false".  Did they
actually do something this perverse without any warning in a man page? Wouldn't
be surprised. I'll give it a try ...
8671.3Nope; SIGEV_SIGNAL ain't the answerAPACHE::CHAMBERSFri Jan 31 1997 16:0314
Well, I tried it, and it didn't change a thing.  Now the lines that fill in
the aiocb fields look like:

        buf[b].aio_fildes = 1;
        buf[b].aio_reqprio = AIO_PRIO_DFL;
        buf[b].aio_sigevent.sigev_value.sival_int = b;
        buf[b].aio_sigevent.sigev_signo = SIGIO;
        buf[b].aio_sigevent.sigev_notify = SIGEV_SIGNAL;

This is accompanied by a call of 
    (void)signal(SIGIO,sigio);
somewhat earlier, and sigio contains a fprintf(stderr,...) to tell me that it
was called.  The message never appears.  As before, aio_error and aio_return
verify that the aio_write calls did complete, but no signal is ever generated.
8671.4APACHE::GOLIKERIFri Jan 31 1997 18:416
    We did try (I work with John from -1) it by incrementing a global
    counter in the signal handler and took out the fprintf's. It worked. So
    it seems the signal handler works but fprintfs in the signal handler
    mess things up or do not work.
    
    Shaila
8671.5But aio_sigevent doesn't seem to be available ...APACHE::CHAMBERSFri Jan 31 1997 19:3029
Yup; as Shaila said, bumping a counter and doing nothing else seems to prove
that the signal handler is indeed called.  However, if this all that's doable,
the signal handler is rather pointless.

The reason for wanting a signal handler is so that the rest of the code can
be told "Hey, aiocb b is done and can be reused."  This is obviously the point
of the aio_sigevent field, and it has a aio_sigevent.sigev_value.sival_int field
that is obviously the place to pass the aiocb index b to the signal handler.

But the trail grows cold here.  I ran
   find /usr/man -type f -print | grep sigevent
to find out everything documented about sigevents.  It gave matches on the
aio_* pages, and on timer_create and mq_notify.  But none of these give any
hint as to how the information in the aio_sigevent might get passed to the
signal handler.  The string "sigevent" doesn't occur in any of the man pages
related to signals.

This means that the signal handler has no (documented) way of learning which
aiocb triggered the signal.  The only apparent way for the signal handler to
discover which aiocb it was called for is to loop thru all of them and test 
each one for completion. This makes the entire signal mechanism moot, since 
it means that the code we're trying to eliminate (a for loop calling aio_error
for each aiocb) must now be done inside the signal handler, and we haven't
saved any time at all; we've just added the complexity and overhead of signals
without any benefit.

Is there some way that the signal handler can learn which aio_sigevent (or
aiocb) was the one that triggered the signal?  Is it documented somewhere?
(Even better, is there an example somewhere that does it right?)
8671.6SMURF::DENHAMDigital UNIX KernelSat Feb 01 1997 01:2248
    Check out the Guide to Realtime Programming. I know you only
    like man pages, but that's where the information currently
    resides. The signal information (SIGINFO) support was added
    in V3.2 but the doc wasn't updated. The information on
    how to use the feature went into the release notes thence
    to the realtime guide. More should go into the man pages.
    I'll make sure that happens.
    
    Now, as to how to get the info you put in the sival field in
    the sigevent structure out of the signal handler:
    
    #include <sys/siginfo.h>
    #include <sys/signal.h>
    
    Set up your handler to use siginfo:
    
    
    sig_act.sa_handler = (void *) sig_handler;
    sigemptyset(&sig_act.sa_mask);
    sig_act.sa_flags = SA_SIGINFO;
    status = sigaction(SIGIO, &sig_act, NULL);
    
    Put something interesting in the sigevent sival field:
    
    sigevent.sigevent.sigev_value.sival_ptr = &acb;
    
    Then in the handler, something like this:
    
    void
    sig_handler(int signo, siginfo_t *sip, void *useless)
    {
    	struct aiocb *ap;
    
    	ap = (struct aiocb *) sip->si_value.sival_ptr;
    	if (aio_error(ap) == EINPROGRESS) {
    		abort();
    	}
        if (aio_return(ap)) {
    		perror("aio_return");
    		exit(1);
        }
    }
    
    Anyway, that's the general idea of how to use it. The siginfo
    concept comes straight from SVR4. The value field was just
    a little tweak added by POSIX realtime.
    
    
8671.7APACHE::CHAMBERSMon Feb 03 1997 13:1635
Thanks for the example.  It looks like it might answer the question.

>    Check out the Guide to Realtime Programming. I know you only
>    like man pages, but that's where the information currently
>    resides. The signal information (SIGINFO) support was added
>    in V3.2 but the doc wasn't updated.

Well, now; I'm not sure that I'd agree that I "only like man pages".
In fact, I'm not sure I'd agree that I usually like them much at all.
But if you're rloginned to a couple of different machines with different
release levels of software, and you're trying to figure out how to get
some code working on *those* machines, the man pages are often the only
documentation that's available for *those* machines.  I don't seem to
see any pointers anywhere to the Guide to Realtime Programming; this
is the first time I've heard of it.  There doesn't seem to be a copy
on any nearby shelf.  Is it available online, on the Web maybe?  Should
the man pages perhaps have pointers to it?

Also, as I mentioned earlier, I did ask Alta Vista if it knew anything
about aio_write and signal and a few other likely keywords.  It did; it
showed me Web versions of the same things that `man whatever` shows me.
No sample code anywhere. Even using keywords like "sample" and "example" 
didn't turn up anything that was recognizable as sample code or C examples.  
The Web versions of the signal-related pages didn't explain how a signal 
handler can learn which aiocb caused the signal.

When it works, Webified documentation can be much more findable, browsable,
etc.able than the Unix man pages.  When it works.  It didn't in this case,
though I spent much more time with it than with "man".  Perhaps this only
shows that it's much more difficult to recognize failure on the Web than
with "man".  Maybe I just couldn't guess the right keywords.  I don't think
there were any clues that the phrase "realtime programming" was related 
to the topic.

Let's see how the sample signal code works ...
8671.8r-t guide is on web (w/ rest of v4.0 docs)SMURF::KHALLMon Feb 03 1997 16:067
    Guide to Realtime Programming is on the web at
    
        http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html
    
    as part of the v4.0 docset.
    
    \ken
8671.9"Digital Internal Use Only" PS fileHELIX::CLARKMon Feb 03 1997 16:5118
>    Guide to Realtime Programming is on the web at
>    
>        http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html
>    
>    as part of the v4.0 docset.

  
  Unfortunately, when you click on the above (within the Programming
  Documentation shelf), you find it's not yet available in HTML.
  
  We plan to revise the book and migrate it to HTML -- current estimate is
  the May/June time frame.  Same for the corresponding manpages, which
  include aio*.  [Suggestions for improving the guide & manpages content
  gratefully accepted.]

  In the meantime, a PS file for the latest edition (V4.0) is available at:
    HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS
  - Jay
8671.10APACHE::CHAMBERSMon Feb 03 1997 20:2335
| >    Guide to Realtime Programming is on the web at  
| >        http://www.zk3.dec.com/~binder/platinum/Digital_UNIX_Bookshelf.html
| >    as part of the v4.0 docset.
|  
|   Unfortunately, when you click on the above (within the Programming
|   Documentation shelf), you find it's not yet available in HTML.

Sure 'nuf.  In fact, when I paste that into netscape's GoTo widget (Don't
you love how they thumb their noses at the anti-GoTo crowd? ;-), what I get
is a looooong pause, and the little popup telling me that "A network error
has occurred ... Try connecting again later."  Several tries got the same
non-response.

| In the meantime, a PS file for the latest edition (V4.0) is available at:
|     HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS

I wonder how one might get there from a Unix workstation?  I tried several
combos of ftp, rcp, and URLs with netscape, and got lots of complaints about
my syntax.  Of course, there's also the observation that we have only a few
4.0 machines hereabouts; most are 3.2somethings.  And the code should be as
portable as possible, of course.

Meanwhile, back at the ranch, with the help of the sigevent example I got
a test program running that successfully copies data from stdin to stdout
using a flock of aiocbs. It works for 1-MB, 5_MB and 100-MB files. Its run
time is *exactly* the same as that of a tiny program that is the trivial
read()/write() loop.  Both use buffers of the same size, 40K. The times never
differed by more than 1 sec (out of 45) for the 100-MB case.

Not much saving there, I'd say.  Where would one learn how to make such things
run faster?  It's not at all obvious how one might learn where an aio-style
program is wasting its time.  If it doesn't run faster, well, why use a big,
complex, nondeterministic program when a simple, couple-line, deterministic
program does the same job at the same speed?  (Other than for the challenge,
of course. ;-)
8671.11HELIX::SONTAKKEMon Feb 03 1997 20:357
    Why would you believe that asyncronous I/O will be faster?  All it
    allows you to do is to do something else while I/O is going on.  You
    are not blocked during the I/O.  You could do double buffering and run
    some computation on one set of data while another is being read/written
    asyncronously.
    
    - Vikas
8671.12ftp location for Guide to Realtime Prog'gHELIX::CLARKMon Feb 03 1997 21:1016
| In the meantime, a PS file for the latest edition (V4.0) is available at:
|     HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS

  OK, the PS file for Guide to Realtime Programming should now be available
  via anonymous ftp.  From:

        osfrt.shr.dec.com:  ./pub/osf/decosf_v40_rt_prog_guide.ps
        
  (The notation means, cd to pub/osf before you get the file.)

  At one point the FTP server on osfrt was not working when accessed as URL
  ftp://osfrt.shr.dec.com/.  If this is a showstopper for anyone, I can
  try to provide a working URL on an internal WWW server.
  
  These are temporary Digital-internal homes -- all this will cease to be
  necessary once the manual is migrated to HTML.   - Jay
8671.13XIRTLU::schottEric R. Schott USG Product ManagementTue Feb 04 1997 11:1214
Hi

 You can access DECnet files via your web browser

See

http://www-unix.zk3.dec.com/www/dgwy.html


or 

http://www-unix.zk3.dec.com/cgi-bin/ersdec?DECNET::FILENAME


8671.14APACHE::CHAMBERSTue Feb 04 1997 13:0660
|     Why would you believe that asyncronous I/O will be faster?  

I don't.  I'm working on a project that needs to get the fastest file-copy
time possible over various kinds of networks.  The folks here are trying to
move gigabyte-size video files around the network. I've been trying out various
ideas to see if they can get us closer to the claimed throughput capabilities
of the hardware.  Someone suggested that async I/O could give us better speed
than the plain read()/write() loop that is our winner so far.  It's worth
testing.  The problem is figuring out how to do it right, given the sparcity
of the documentation.  ("Guide to Real Time Programming?  I think Mike has a
copy; let's go see ... Nope; I'm sure I saw a copy around here; maybe if we
look on Jim's shelves ... No, not there either ... Have you tried asking in
the notes file? ... How about on the Web? ... ")

|    All it
|    allows you to do is to do something else while I/O is going on.  You
|    are not blocked during the I/O.  You could do double buffering and run
|    some computation on one set of data while another is being read/written
|    asyncronously. 

And of course that's exactly why you'd expect that it might be slightly faster.
But then, it might not be.  It's entirely possible that the kernel's buffering
(and the buffering in the network) gives you all the overlap that is possible
when you're just trying to move data, and async I/O won't gain you anything
beyond that.  It's worth testing (if you can get the info that it takes to
do it right).

Actually, there's another reason to expect that aio_write() might be faster
than write().  The write() system call has to copy the data from the user's
buffer into a kernel buffer, and then to disk or the network or wherever.
The aio_write() routine (in theory) need not do this copy, but can use the
data in the user's buffer.  This eliminates a copy, which *should* be a time
saving.  Of course, it might not be.  One reason is that disks require that
data be written in sector-size chunks, so if the process isn't writing data
in multiples of a sector, copying will be needed anyway.  Similarly, writing
to a TCP or UDP socket entails adding packet headers, and this certain can't
be done in the bytes just before the process's buffer, so again copying might
be necessary.  So we have another "maybe it'll be faster; maybe it won't"
situation.  "Let's code it up and run some timing tests ...."

Several other ideas have been tried; not all worked.  Thus, I spent a month
or so testing out the claim that "threads is the answer".  I did get a big,
complicated threaded file copy working, after much grief, and in the end, it
failed (i.e., spent long milliseconds sleeping when it should have been reading
data) in *exactly* the same way that the older select() and poll() based code
did.  But the explanation of why it should work sounded reasonable, and it was
deemed worth testing.

It's entirely possible that both the threaded-code test and my more recent
aio-code test are "not written quite right", and could perform better. That's
why I'm asking questions here.  Asynchronous, "real-time" code can be tricky,
and it can shoot itself in the foot in many subtle ways, not all of which are
obvious from reading the manuals (especially when the manuals lack examples).

(I do find it interesting to reflect on the fact that I've written a lot of
multi-tasking code in tcl in recent years, and I've "never had any trouble
with it".  I just write it, and it does what I expect.  In particular, I
use asynchronous I/O in tcl as the "preferred" technique, because dealing
with a fileevent is so straightforward.  I wonder if there might be some sort
of lesson lurking here?  Naaah ... ;-)
8671.15APACHE::GOLIKERITue Feb 04 1997 13:1011
    prev . faster or not?
    
    Vikas,
    
    you are right, in that aio in itself does not mean improved
    performance. But from the application point of view, it can as you said
    do double (or any  number) buffering and then do something else than
    wait on a completion of a write or read. The asynchronicity (sp?) is
    for the application more than layers below.
    
    Shaila
8671.16HELIX::CLARKTue Feb 04 1997 13:3112
  To answer for the documentation end of things:
  
  Since we plan to revise the Guide to Realtime Programming and associated
  manpages (see % apropos .1b) in the near future, we can address items such
  as:
  - look to improve aio & signals examples
  - identify "realtime" & library associations of manpages and point to
    examples
  - consider a sigevent(4) manpage
  - ???   suggestions welcome

  Jay
8671.17write() doesn't require copyWASTED::mapMark Parenti, Unix Engineering GroupTue Feb 04 1997 15:0324
> Actually, there's another reason to expect that aio_write() might be faster
> than write().  The write() system call has to copy the data from the user's
> buffer into a kernel buffer, and then to disk or the network or wherever.
> The aio_write() routine (in theory) need not do this copy, but can use the
> data in the user's buffer.  This eliminates a copy, which *should* be a time
> saving.  Of course, it might not be.  One reason is that disks require that
> data be written in sector-size chunks, so if the process isn't writing data
> in multiples of a sector, copying will be needed anyway.  Similarly, writing
> to a TCP or UDP socket entails adding packet headers, and this certain can't
> be done in the bytes just before the process's buffer, so again copying might
> be necessary.  So we have another "maybe it'll be faster; maybe it won't"
> situation.  "Let's code it up and run some timing tests ...."

It isn't true that using write() requires a copy. It depends on the driver
type. If you are using raw disks then write() can do the I/O directly from
a user buffer. Since aio_write() simply goes through the drivers write()
routine I would expect it would also be true for that interface. For
network devices, however, I believe that all I/O, including aio, requires
a copy. There are a couple of projects underway to look at allowing direct
writes from user space buffers it doesn't work that way now. This is due to
the way packets are passed down the network protocol stack, I believe.

Mark Parenti
UEG
8671.18VAXCPU::michaudJeff Michaud - ObjectBrokerTue Feb 04 1997 15:2417
> There are a couple of projects underway to look at allowing direct
> writes from user space buffers it doesn't work that way now. This is due to
> the way packets are passed down the network protocol stack, I believe.

	Also I believe that historically the kernel had no way to keep
	the users buffer locked in memory after the syscall returned,
	hence the address may not be valid when subsequently the driver
	accessed the buffer.

	Plus the network model is that when a write/send returns, the user
	then owns the buffer again, and is free to reuse it, free it, etc.
	If the kernel/driver were to not copy the user's buffer, then either
	the write/send would have to block until the user data made it to
	the remote system and was acked, or the interface would have to be
	modified to support some form of status block, call back, and/or other
	means so that the user can determine when the buffer is free to be
	used again or freed.
8671.19Web location for Guide to Realtime Prog'gHELIX::CLARKThu Apr 10 1997 14:5123
  As mentioned earlier in the string, an HTML version of the "Guide to
  Realtime Programming" will be available on the Web in a couple of months.
  
  In the meantime, a PDF (Adobe Acrobat) version is available and
  customer-accessible on the web, within DIGITAL's OEM InfoCenter:
  
      http://www.digital.com/oem/library/docs/docs.htm#rtunixdocs

  See the top of the page, as necessary, for a hot spot that downloads the
  Adobe Acrobart Reader for viewing.

  The PS file for the manual remains internally accessible via ftp and
  DECnet.

>  ftp:
>  osfrt.shr.dec.com:  ./pub/osf/decosf_v40_rt_prog_guide.ps
>  (The notation means, cd to pub/osf before you get the file.)

>  DECnet:
>  HELIX::USER7:[ELNNETDOC]DECOSF_V40_RT_PROG_GUIDE.PS
  
  All the above will cease to be necessary once the manual is migrated to
  HTML.   - Jay
8671.20HELIX::SONTAKKEFri Apr 11 1997 15:306
    By the way, for some bizarre reason you can not use Netscape and
    provide ftp://osfrt.shr.dec.com/	as URL.  Nobody knows the reason
    why it does not function under Netscape.  The same URL will work with
    Lynx2-5
    
    - Vikas