[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::c_plus_plus

Title:	C++
Notice:	Read 1.* and use keywords (e.g. SHOW KEY/FULL KIT_CXX_VAX_VMS)
Moderator:	DECCXX::AMARTIN

Created:	Fri Nov 06 1987
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	3604
Total number of notes:	18242

3451.0. "possible optimization bug on UNIX" by HYDRA::DORHAMER () Fri Feb 14 1997 18:36

The following problem was reported by a software partner who is porting
to Digital UNIX v4.0 with DEC C++ v5.5.  Their code gets the expected answers 
when they compile with -g, but gets incorrect answers when compiling without 
-g.

In the working version, compiled with "-g", the lists of int and float
contain the appropriate values:

        % cxx -g -I. -o works bug.cc libL.a
        % works
         1 2 -3 5
         123456789012 20015998343868 5714878441043
         1 -5 3.2 -5.9 3e+07 3.2e+07
         1 -5 3.2 -5.9 3e+07 3.2e+07

In the failing version, compiled without "-g", the lists of int and
float all contain 0.

        % cxx    -I. -o fails bug.cc libL.a
	% fails
         0 0 0 0
         123456789012 20015998343868 5714878441043
         0 0 0 0 0 0
         1 -5 3.2 -5.9 3e+07 3.2e+07

In both versions, the lists of long and double behave correctly.
The code makes use of the LEDA library, which they obtained off the net.

Here is the source code.  The entire source with the header files, library
and makefile can be found in the tar file 8878.tar in the anonymous ftp area on 
fluid.mro.dec.com.


#include <LEDA/list.h>

main ()
{
  list<int> i32List;
  i32List.append (1);
  i32List.append (2);
  i32List.append (-3);
  i32List.append (5);
  int i32;
  forall (i32, i32List) {
    cout << " " << i32;
  }
  cout << endl;

  list<long> i64List;
  i64List.append (123456789012L);
  i64List.append (0x123456789abcL);
  i64List.append (0123123123123123L);
  long i64;
  forall (i64, i64List) {
    cout << " " << i64;
  }
  cout << endl;

  list<float> f32List;
  f32List.append (1);
  f32List.append (-5);
  f32List.append (3.2);
  f32List.append (-5.9);
  f32List.append (3e7);
  f32List.append (3.2e7);
  float f32;
  forall (f32, f32List) {
    cout << " " << f32;
  }
  cout << endl;

  list<double> f64List;
  f64List.append (1);
  f64List.append (-5);
  f64List.append (3.2);
  f64List.append (-5.9);
  f64List.append (3e7);
  f64List.append (3.2e7);
  double f64;
  forall (f64, f64List) {
    cout << " " << f64;
  }
  cout << endl;
}

T.R	Title	User	Personal Name	Date	Lines
3451.1	User error, I think: illegal aliasing	WIDTH::MDAVIS	Mark Davis - compiler maniac	`Tue Feb 18 1997 16:46`	135
	Here is a cut down example: #include <stdio.h> typedef void GenPtr; inline void Avoid_Unused_Warning(GenPtr) {} inline void operator new(size_t,void* p,int) { return p; } inline GenPtr leda_copy(const int& x){ GenPtr p=0; if (sizeof(int) <= sizeof(GenPtr)) // COPY_INintO_WORD(int,x,p); Avoid_Unused_Warning(new((int)&p,0) int(x)); if (sizeof(int) > sizeof(GenPtr)) p = new int(x); return p; } struct list { GenPtr empty, item; list(): empty((GenPtr)999), item((GenPtr)1000){} void append(const int& x) { item = (leda_copy(x));} }; void main(){ list L; L.append(123456789); printf("%ld \n",L.item); } The problem is trying to store the int value into storage declared to be GenPtr p in leda_copy. The "new((int)&p,0) int(x) " returns the address of p as the result of new, and then stores "x" into it. The type of &p is "pointer to Genptr", the type of new() is "pointer to int", and the store is done through the "pointer to int". C++ aliasing rules say that storing through "pointer to type A" does NOT affect memory pointed to by pointers of type "pointer to type B" (except for "pointer to char"). At optimization levels higher than -O0 (implied by -g), all of the routines are inlined, so the compiler can "see" exactly what is going on. To the compiler, the program looks like: 1. p = 0; 2. ((int )&p) = x; 3. L.item = p; 4. printf(...,L.item); The NO aliasing rule implies that the store at 2. cannot modify the value of p, because p is of type GenPtr, and the object being stored into is of type int. Therefore the compiler looks around and notices that the last value assigned to p was the 0 from stmt 1., so the generated code looks like: 1. p = 0; 2. ((int )&p) = x; 3. L.item = 0; 4. printf(...,0); and so the value stored, and the value printed, is Zero. The reason the program in .0 prints "unexpected" values is because the value originally stored into the list_items is Zero; it is not that the iterators or access routines perform incorrectly. Soo, I believe the program as written uses illegal aliasing, and therefore the compiler is intitled to produce undefined results, including storing zeros into your list_items. However, you STILL want to be able to store your small objects into the pointer, instead of allocating new storage to place the small value into. I have 2 workarounds, and there are no doubt others. I. Omit the assignment of 0 to p. This makes the sequence look like: 2. ((int )&p) = x; 3. L.item = p; 4. printf(...,L.item); and since the compiler has no literal value that has been assigned to p, it actually has to load a value out of the storage for p in order to do stmt 3. Of course, the compiler might decide to execute 3. before 2., since they are "independent" according to the aliasing rule, which would produce total garbage. (Note, if you don't initialize p to 0, there will be garbage in the part of p that is NOT occupied by x.) II. Change Avoid_Unused_Warning so that it is defined in a different file (so it cannot be inlined), and modify it so that it ALSO takes the address of p as an argument. So the program looks like: void Avoid_Unused_Warning(GenPtr,GenPtr); // fn prototype ... if (sizeof(int) <= sizeof(GenPtr)) // COPY_INintO_WORD(int,x,p); Avoid_Unused_Warning(new((int)&p,0) int(x),&p); ^^^ and have a new file with only: typedef void GenPtr; void Avoid_Unused_Warning(GenPtr a,GenPtr p){} The effect of this is 1. the compiler has to store x into new(...) before calling Avoid_Unused_Warning 2. Since &p is passed as an arg, AND the compiler can't see the routine, it has to assume the value of p might be changed. So at 3. L.item = p; the compiler has to load the value out of p. This has the advantage of always working, at the disadvantage of actually having to make the subroutine call. Oh, the same game can be played with "new(...)", I had thought that it was templatized, but it isn't. So you can use a prototype: void* operator new(size_t,void* p,int); in your main file, and then put the definition into another file: #include <stdio.h> void* operator new(size_t i,void* p,int x) { return p; } Since new(...&p...) already passes the address of p to an external routine, the compiler has to assume p has been changed. Mark Davis c/c++ team
3451.2	workaround #3: declare p to be volatile	WIDTH::MDAVIS	Mark Davis - compiler maniac	`Tue Feb 18 1997 19:29`	4
	Randy Meyers' suggestion is to use volatile, which will force the compiler to perform every load and store to p: GenPtr volatile p=0;
3451.3	workaround #4 and #5	DECCXL::OUELLETTE		`Wed Feb 19 1997 20:24`	24
	I think you could also change GenPtr to be a union (and change all of the uses to match). typedef union { int i; void *p; } GenPtr; While the ANSI C standard states that a valid program will only fetch the union field most recently stored, contrary usage is common enough that no compiler takes advantage of this rule (and is unlikely to ever do so by default). It'd be a whole lot lighter of a hammer than the volatile fix. Were the cxx compiler to have a -noansi_alias switch, you could use that to make your much like K&R C code compile. I'm surprised that the cxx driver doesn't have the switch. But the compiler does. You can bypass the driver by adding... -Wf,-noansi_alias to your cxx command. I would think that either of these choices would fix your problem, but I've tried neither to verify this assertion. Since Mark has your example nicely cut down, I may try it with him tomorrow. R.
3451.4	Further clarification.	WIDTH::MDAVIS	Mark Davis - compiler maniac	`Fri Feb 21 1997 14:03`	114
	In case I wasn't clear enought in .1, there IS a user error in the program because it relies on illegal aliasing. Here's the Draft Standard wording from section 3.10: <quote> 15If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined24): --the dynamic type of the object, --a cv-qualified version of the dynamic type of the object, --a type that is the signed or unsigned type corresponding to the dynamic type of the object, --a type that is the signed or unsigned type corresponding to a cv- qualified version of the dynamic type of the object, --an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a sub- aggregate or contained union), --a type that is a (possibly cv-qualified) base class type of the dynamic type of the object, --a char or unsigned char type. _________________________ 24) The intent of this list is to specify those circumstances in which an object may or may not be aliased. <unquote> In other words, if you want to change the value of a GenPtr, the store has to be done to something of type GenPtr or char; storing to something of type int produces undefined results. The "workarounds" in .1 and .2 still produce "undefined" results according to the standard; they happen to hide enough from the compiler so it does what we want. [Digression: .3 workaround: typedef union { int i; void p; } GenPtr; does not work in the original case, where leda_copy is templatized on T (=int here) : GenPtr has to be void, a "universal pointer" in the underlying, non-template base class. It also does not work to use a local union{GenPtr pp;int ii}p; in leda_copy: the store of "x" will be through an int, which does not modify a GenPtr. end digression] I think I have a "defined" way of performing the "magic" of storing an int (or a T) into a GenPtr: by using memmove after doing the "new". The memmove copies chars via char s, which seems to obey the last clause about aliasing. #include <string.h> // decl for memmove #pragma intrinsic(memmove) // compiler will expand it inline ... inline GenPtr leda_copy(const int& x){ GenPtr p=0; if (sizeof(int) <= sizeof(GenPtr)) { char buf[2sizeof(int)]; memmove(&p,new((int)&buf,0) int(x),sizeof(int));} if (sizeof(int) > sizeof(GenPtr)) p = new int(x); return p; } The effect is to do a "new" into a buffer that is large enough for the int (including alignment, if necessary, though the compiler will always align buf on a quadword boundary....); and then bit-copy those bits into p. This "defined" workaround does generate the expected answer. Note: 1. we can't just copy x, because if it is a class, it might have special copy-constructor semantics. 2. we can't just declare "int y=x;" and memmove y to p, because if x is a class, y will be destructed when leda_copy is exited, with maybe the wrong effects for the bit-copy of y in p. 3. it's OK to bit-copy the new-ed object into p, because the value in p is going to be bit-copied all over the place. (I.e., the hack of putting an object INSIDE of p will not work for a class type that has a self-referential field - though these will probably be larger than sizeof(GenPtr), so this is not much of a drawback). In order to fix the undefined behavior in the original program, you need to modify LEDA/param_types.h: inline void* operator new(size_t, void* where, int) { return where; } /* New code starts here / #if defined(__DECCXX) #include <string.h> #pragma intrinsic(memmove) #define COPY_INTO_WORD(T,x,p) { char buf[2sizeof(T)]; \ memmove(&p,new((T)&buf,0) T(x),sizeof(T));} #define CONSTRUCT_WORD(T,p) { char buf[2sizeof(T)]; \ memmove(&p,new((T)&buf,0) T,sizeof(T));} #else / original code / #define COPY_INTO_WORD(T,x,p) Avoid_Unused_Warning(new((T)&p,0) T(x)) #define CONSTRUCT_WORD(T,p) Avoid_Unused_Warning(new((T)&p,0) T) #endif / new endif */
3451.5	documentation on reinterpret_cast?	HYDRA::DORHAMER		`Fri Feb 28 1997 12:38`	33
	The software partner would like a pointer to documentation that explains how the compiler handles reinterpret_cast, so they can make sure that their code is correct. Attached is the note from the partner. Thanks, Karen Thanks very much, i've forwarded your analysis and fix to the leda group. > In case I wasn't clear enought in .1, there IS a user error in the program > because it relies on illegal aliasing. Here's the Draft Standard wording > from section 3.10: > <quote> > 15If a program attempts to access the stored value of an object through > an lvalue of other than one of the following types the behavior is > undefined24): I had been reading this section and discussing the problem, trying to verify your analysis. There seems to be general agreement that the compiler is in "spec" though some don't particularly like the handling. Its OK with me, i'd rather have the performance that will usually result. However, could you point me to the best spot in the docs to read about the cxx compiler's handling of reinterpret_cast. The lowest levels of our software uses considerable "undefined" or vendor defined casting, and though things seem fine now, i don't want to be suprised with a new compiler release..... - - -ernie
3451.6	reinterpret_cast is not in current DEC C++ compilers (v5.*)	WIDTH::MDAVIS	Mark Davis - compiler maniac	`Fri Feb 28 1997 15:26`	27
	reinterpret_cast, dynamic_cast, static_cast, and const_cast are new features of ANSI C++, and do not appear in our current V5.* cxx compilers. So the easy answer to the question is that there is no documentation because we don't do it :) The new ANSI-compliant C++ compiler we are working on, V6., does support these new ansi features, and all the others. The standard (5.2.10 Reinterpret cast) sure does contain a lot of "the result of such a xxx conversion is unspecified", i.e., "Behavior ... that depends on the implementation." The only behavior guaranteed in the standard for most cases is that for "T1 x" x == reinterpret_cast<T1>(reinterpret_cast<T2>(x)) reinterpret_cast<T&>(A) seems to be just a formalization of the old (T)(void )(&A) hack . I assume our "official" behavior specification will be similar to the current meaning of (T)(void *)(&A): that you refer to the same storage as A . HOWEVER: the alias rule against storing to an object via one type and loading via another ("unrelated") type will still be assumed. Mark
3451.7	Thanks :-)	HYDRA::DORHAMER		`Mon Mar 03 1997 20:16`	8
	Thanks for all your help. I'm glad to see that while developing a world class compiler, you folks are maintaining your sense of humor (just noticed that someone changed the title on the base note :-) I have forwarded all the info to the software partner and they do appreciate all of the help, feedback and work arounds. Karen
3451.8	De nada	DECCXX::AMARTIN	Alan H. Martin	`Mon Mar 03 1997 23:08`	14
	Re .7: > Thanks for all your help. I'm glad to see that while developing a > world class compiler, you folks are maintaining your sense of humor > (just noticed that someone changed the title on the base note :-) Actually, it's a technical term. An AltaVista search on ``type punning'' uncovers a few uses of it out on the web, such as in N.M Maclaren's _User Experiences with the X Windowing System_, at URL: http://www-h.eng.cam.ac.uk/help/tpl/graphics/X/comments.html . I'd have used a more popular phrase if I could think of one that made the title fit... /AHM