[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference bulova::decw_jan-89_to_nov-90

Title:DECWINDOWS 26-JAN-89 to 29-NOV-90
Notice:See 1639.0 for VMS V5.3 kit; 2043.0 for 5.4 IFT kit
Moderator:STAR::VATNE
Created:Mon Oct 30 1989
Last Modified:Mon Dec 31 1990
Last Successful Update:Fri Jun 06 1997
Number of topics:3726
Total number of notes:19516

2292.0. "AST problem with XNextEvent not seeing input events" by TOOLEY::B_WACKER () Fri Feb 16 1990 18:04

I have a 5.3 VMS customer with an ast problem in xlib.  The program sets up a
640x400 window and selects for keypress events and just printf's them on
receipt.  After the first keypress event is received a routine is invoked to put
a 640x400 image to the window and requeue itself with a setimr ast for 1 second
after completing the putimage.  On alternate invocations it clears the window,
instead. 

It seems like the client side event queue is getting confused.  
Sometimes keypresses will immediately show up.  Other times you can 
enter a hundred or so keypresses and none appear.  The image is 
faithfully drawn and cleared and the server spends a lot of time in 
hiber.  From the speed of drawing you can see that the network load 
varies when running remote, but has periods of low load.  After 
waiting awhile, entering one more key will sometimes free up the 
buffered ones and they'll spew out.

If synchronize is turned on then you never get any keypress events 
back!  The customer says they can get them if synchronize is only turned  
on while there are events pending, but I haven't verified this.

The customer has the equivalent processing working fine with toolkit 
and add_input.

Can anyone shed any light on why XNextEvent gets blocked when there is 
events in the queue and also many 1 second periods of inactivity to
process them in.  A typical signal is: 

Event = 2, key = <, count = 141 <-- the last keypress processed

XIO:  fatal IO error 65535  on X server "29340::0.0"
      after 184 requests (135 known processed) with 0 events remaining.
%XLIB-F-IOERROR, xlib io error
-SYSTEM-F-LINKABORT, network partner aborted logical link
%TRACE-F-TRACEBACK, symbolic stack dump follows
module name     routine name                     line       rel PC    abs PC

                                                           000C761D  000C761D
                                                           000C7C7F  000C7C7F
                                                           000C2711  000C2711
                                                           000C28B5  000C28B5
                                                           000C29B4  000C29B4
                                                           000C29EE  000C29EE
                                                           000C2A1C  000C2A1C
                                                           000C2A1C  000C2A1C
                                                           000C2A1C  000C2A1C
                                                           000C2A1C  000C2A1C
                                                           000C2A1C  000C2A1C
                                                           000C2B9B  000C2B9B
AST             tick_tock                        2578      00000092  0003EC92
               tick_tock is the ast routine                80202CF3  80202CF3
               2578 is the putimage call                   000AF784  000AF784
                                                           000C9367  000C9367
                                                           000C65D0  000C65D0
AST             main                             2617      000000C8  0003EDB0

T.RTitleUserPersonal
Name
DateLines
2292.1Question: will this mean a big problem for your customer?DECWIN::JMSYNGEJames M Synge, VMS DevelopmentFri Feb 16 1990 21:007
    Indeed this is a bug.  It has been QARed already.
    
    The READ path in the transport layer should be re-entrant, but due to
    this bug, it screws up its internal state.  I'm afraid this isn't
    likely to be fixed until AFTER V5.4.
    
    James
2292.2Very ugly bugTOOLEY::B_WACKERTue Feb 20 1990 19:186
Non-ast reentrancy of Xlib is a big problem for this customer and it 
has been CLD'd to get high priority.

I hope something can be done to fix it before 5.4 because our official 
line is that Xlib is ast reentrant and that the read side of the 
transport isn't seems like an enormous exposure for us.
2292.3Must be fixed quickly !!WSINT::GOLDBERGMarshall R. Goldberg, WorkstationsThu Feb 22 1990 16:246
    
    Failure to fix this problem quickly will greatly damage Digital's
    DOS/PC Solutions strategy.
    
    Marshall
    
2292.4QAR it!STAR::VATNEPeter Vatne, VMS DevelopmentThu Feb 22 1990 16:354
>    Failure to fix this problem quickly will greatly damage Digital's
>    DOS/PC Solutions strategy.

If you feel this is important, I suggest you QAR this.
2292.5cld'dTOOLEY::B_WACKERFri Feb 23 1990 13:104
>If you feel this is important, I suggest you QAR this.

It was quar'd.  This comes from a customer and since it is critical it 
was CLD'd.
2292.6NOTES is not a formal customer problem resolution mechanismPSW::WINALSKICareful with that VAX, EugeneFri Feb 23 1990 19:099
RE: .5

>It was quar'd.  This comes from a customer and since it is critical it 
>was CLD'd.

The CLD process gives you a direct channel to the responsible developers.  Why
aren't you using that instead of whining in NOTES conferences?

--PSW
2292.7WSINT::GOLDBERGMarshall R. Goldberg, WorkstationsTue Feb 27 1990 15:1212
    
    Considering the less than satisfactory response when this bug was first
    reported, why not post it in this notes file and let everyone know
    about it? Doesn't seem unreasonable to me. Very much appreciate
    past resonses to our problems here; people have been wonderful.
    Isn't best we share this information with as wide an audience as
    possible?
    
    Marshall
    
    
    
2292.8Please understand the processSTAR::VATNEPeter Vatne, VMS DevelopmentTue Feb 27 1990 16:5422
It is reasonable to post problems in this notes file for general discussion.

What makes developers mad is when the problem described in the notes files
appears to be a serious problem affecting customers, and yet there is no
indication that the problem has been sent through regular channels.  Perhaps
the problem has been properly escalated.  But if you post the description
in this notes file, please so indicate!

Another problem is that it is not clear to us whether TOOLEY::B_WACKER and
WSINT::GOLDBERG are working on the same customer problem or not.  If these
problems are for two different customers, then two separate QARs should be
entered.  The priority for one customer's problem may be different from
another customer's problem, even though they stem from the same bug.

Finally, statements like "must be fixed quickly!!" have no use in this
notes file, as audience here is the general Digital community, not the
developers.  I have no idea why "failure to fix this problem quickly will
greatly damage Digital's DOS/PS Solutions strategy."  The correct procedure
is to explain in a QAR in detail exactly what this means.  Only then will
solving the problem get the proper priority it deserves.

Thank you.
2292.9clarificationTOOLEY::B_WACKERWed Feb 28 1990 14:4634
>there is no indication that the problem has been sent through regular
>channels.

Back a couple of notes I said it had been CLD'd (and got accused of 
whining), so it was indicated.

>Another problem is that it is not clear to us whether TOOLEY::B_WACKER
>and WSINT::GOLDBERG are working on the same customer problem or not.

It is the same problem for the same customer.  Goldberg is something 
like contract administrator/liason.  I'm with CSC/CS.

If you're interested there's an article in Digital News, February 19, 
on page 62 about the coprocessor product that is affected.

>...have no use in this notes file

Agreed, but Goldberg doesn't frequent here and doesn't know that.

>audience here is the general Digital community, not the developers.

In my experience the "official" channels are woefully slow and often
breed misunderstanding.  SPR's average 6-9 months to get a response.
CSSE is often over-committed and not much help.  Until the other 
channels are dramatically improved notes is what gets the job done,
and even informal participation by developers is crucial to what
success we do have. 

I put a little more back in here after I had enough to work the 
customer issue because I thought the DW community needed to be aware 
that Xlib may not be as ast reentrant as we believe.  If I've broken 
notes etiquette in the process please send me mail and I'll fix it.

Bruce
2292.10PSW::WINALSKICareful with that VAX, EugeneWed Feb 28 1990 19:4714
RE: .9

>Until the other 
>channels are dramatically improved notes is what gets the job done,

That is precisely Peter Vatne's point.  NOTES conferences DON'T get the job
done, where "the job" is getting problems reported to the developer and fixed in
a reliable, timely fashion.

If, as you say, you have a CLD open on this problem, then you already have a
channel open to the developers responsible for fixing this problem.  Make your
comments about the urgency of the problem known through those channels.

--PSW
2292.11rathole alertTOOLEY::B_WACKERThu Mar 01 1990 18:4623
>NOTES conferences DON'T get the job done, where "the job" is getting
>problems reported to the developer and fixed in a reliable, timely
>fashion.

In this and many cases the first thing is to figure out if there 
really is a problem.  Notes and all you gurus who read them are 
invaluable in this respect.  From the hinterlands the QAR system is
impossible to search, so notes also tend to flush known problems and
sometimes provide already fixed next release knowledge unavailable
elsewhere.  I don't think any CSC specialist thinks notes will get 
anything fixed.

>Make your comments about the urgency of the problem known through
>those channels. 

I don't think I made such comments and I already apologized for 
Goldberg's unfamiliarity with the whole system.  I feel like I'm being 
blamed for another's actions and since I have a vested interest in 
this conference, the product, and relations with the developers I 
repeat my request to direct your remarks through mail to the concerned 
party and not beleaguer it further here.

Bruce (not Marshall)
2292.12Moderator opinionDECWIN::FISHERBurns Fisher 381-1466, ZKO3-4/W23Tue Mar 06 1990 17:0527
    Two comments with two different hats on:
    
    1. with my devo hat on:  We have gotten the CLD (the same day the
    original note was written, it turns out) and are working on it.
    
    2. with my moderator hat on top of the devo hat:  I don't see anything
    in this note that has not been in 100 other notes and has not been
    stamped on.  In my opinion, it is fine to discuss problems in here and
    to vent frustrations etc etc.  The crucial thing to recognize is that
    this conference is a hit-or-miss thing.  You may happen to catch the
    eye of a devo who will go off and investigate the problem and come up
    with a workaround or a fix quickly.  However, on the other hand, we are
    not measured on how many notes we answer.  We are measured and
    monitored on how we deal with QARs/SPRs/CLDs.
    
    In addition, I don't think there is any apology due for anything said.
    I don't believe there were any breaches of ettiquite (sp?).  However,
    from a practical standpoint, you should know that as developers reading
    a conference, it is real easy to filter out "this problem is critical"
    or to get annoyed when seeing it for the nth time.  The reason is that
    nearly everyone says their problem is critical.  We have no way to
    weigh one critical problem against another.  In this particular case,
    we have raised the priority of the problem quite high based on both the
    CLD and the fact that in looking at the problem, we realized that it
    probably affects a large number of people.
    
    Burns
2292.13TLA -- three letter acronymHPSRAD::KOMARYou can't fool NatureTue Mar 13 1990 18:0810
    
    	I wonder what CLD stands for in this context.
    
    	QAR :== Quality Assurance Report
    	SPR :== System Performance Report
    
    [	CLD :== Command line definition	]? :-)
    
    
    
2292.14STAR::KLEINSORGEFred Kleinsorge, VMS DevelopmentTue Mar 13 1990 20:3821
Customer Level D??? or some such nonsense.  What the letters stand for
is not as important as what it is...

A CLD is a mechanism by which a customer can unilaterally elevate a
problem that is having a substantial impact on the operation of their
system (i.e. an impact on their business).  CLDs can only be closed
by the customer and cannot be closed by engineering, SWS or CSSE and
can be closed by a patch sent to the customer, or anything else that
makes the customer happy.

CLD's appear to be a compromise between the black hole called a SPR
and KO call.

While intended for serious business impact problems (as I've been told) the
CLD is now the weapon of choice by customers who are used to zero response
via the "normal" problem reporting mechanisms.





2292.15My view of a "CLD"LNKUGL::BOWMANBob Bowman, CSC/CS SPACE TeamWed Mar 14 1990 02:0431
    A CLD is the mechanism originally created by Field Service (now
    Customer Service) which allows a local unit manager responsible for a
    particular customer installation (i.e. they have purchased a support
    contract and the resources to service this contract are supplied by the
    local office unit manager). The unit manager may choose to escalate
    critical customer situations which may require corporate level
    resources for resolution by logging a call with the "Central/Critical
    Log Desk". This CLD is part of the process called MAP (Management
    Action Planning) which requires the local unit manager to create and
    follow thru on an action plan which is designed to resolve the customer
    issue.
    
    Although originally designed to escalate down hardware situations, it
    was adopted for the software arena when FS became responsible for
    remedial software issues. 
    
    We now have (again) an agreement (at least for VMS components) that a
    priority 1 or 2 SPR will be treated at corporate levels with the same
    priority as a CLD. For a time this was not the case, and therefore
    customers and the field began using the CLD process when a priority 1/2
    SPR would have been used before. Hopefully the corporation is beginning
    to get the kinks worked out of the process.
    
    A CLD can only be closed by the originater (called the Problem Manager)
    and that is normally only done when the customer has been satisfied to
    the extent that Digital is able to do so.
    
    Bob Bowman
    Consultant, CSC/CS
    (Part of the team responsible for handling 
     Software Problem escalations...but not CLDs)
2292.16EEMELI::PEURAPekka Peura, CSG-HelsinkiSun Mar 18 1990 19:5712
    re: .-1
    
   > A CLD can only be closed by the originater (called the Problem Manager)
   > and that is normally only done when the customer has been satisfied to
   > the extent that Digital is able to do so.
  
    This is the theory. However often CSSE/and/or/Engineering close
    CLDs without any confirmation that either the customer or Problem
    manager are satisfied with the proposed solution. But i guess this
    is a rathole that does not belong here.
    
    		Pekka
2292.17Same or similar problem?SCAM::DIALFri Sep 07 1990 18:4521
2292.18synchronize in ast routineSTAR::HARDYWed Sep 12 1990 17:2122
    
>    1.  Is this likely to be the same problem described in reply 0?
    
    	Same problem.
    
>    2.  Are there other problems relating to ASTs and event handling to
>        watch out for?
    
    	Prior to V5.4, if both non-ast processing and AST processing are
    	both waiting for an event or reply, they can incur this problem.
    
>    3.  Does removing the synchronize really cure the problem?
 
    	Since the synchronize performs a wait for a reply, yes.  If you
    	added another form of event or reply wait to your AST processing,
    	the problem would return.
    
    	This bug is fixed in V5.4.
    
    
    		Sam