[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference msbcs::hpc

Title:Parallel processing through Workstation Farms.
Notice:MSBCS::HPC (renamed from HPCGRP::WORKSTATION_FARMS)
Moderator:MSBCS::SYSTEM
Created:Tue Oct 27 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:507
Total number of notes:1791

506.0. "hpf run time errors" by RTOMS::PARETIJ () Thu Jun 05 1997 10:55

unix 4.* ; trucluster 1.4 ; pse 1.3
-------------------------------------


pi_example is an hpf program - see eroor messages below -

appl3.hrz.uni-siegen.de> pi_example -pref_comm shm
ERROR Peer[2] code:0.11 _SHM_PostExchange: Failed to attach the shared memory ar
ea
      /tmp_mnt/usr/sde/disks/farm_build/src/hpf/lib/libhpf/msgshm.c Line:633 (er
rno:22)

ERROR Peer[3] code:0.11 _SHM_PostExchange: Failed to attach the shared memory ar
ea
      /tmp_mnt/usr/sde/disks/farm_build/src/hpf/lib/libhpf/msgshm.c Line:633 (er
rno:22)
appl3.hrz.uni-siegen.de> pi_example -pref_comm mc
 06/05/97  09:33:03.115

 value computed =    3.14159261548428
 for arithmetic statement function called in forall
   elapsed time =   3.96329999

 value computed =    3.14159260737142
 for arithmetic statement function inlined in forall
   elapsed time =   0.31859994

 value computed =    3.14159261548428 for internal pure function called in forall
   elapsed time =   0.32489967

 value computed =    3.14159260737142
 for scalar do loop with function inlined
   elapsed time =   0.46820021
 06/05/97  09:33:08.220
appl3.hrz.uni-siegen.de>

T.RTitleUserPersonal
Name
DateLines
506.1add -verbose and rerunHPCGRP::BENSONThu Jun 05 1997 13:484

    Please rerun the program -verbose and capture standard output and error.
    
506.2re. add -verbose and rerunRTOMS::PARETIJFri Jun 06 1997 06:27117
appl3.hrz.uni-siegen.de>
appl3.hrz.uni-siegen.de> pi_example -pref_comm shm -verbose
Partition server = appl3.hrz.uni-siegen.de [141.99.128.56]
peer primary         name       load  cpus vector          ...
   0 141.99.127.2    appl4      0.02     2
   1 141.99.127.2    appl4      0.02     2
   2 141.99.128.56   appl3      1.30     2
   3 141.99.128.56   appl3      1.30     2
[0] _hpf_GetCommvector: conn. from peer 0: 127.0.0.1  type: SHM
[0] _hpf_GetCommvector: conn. from peer 1: 127.0.0.1  type: SHM
[0] _hpf_GetCommvector: conn. from peer 2: 192.168.1.2  type: MCH
[0] _hpf_GetCommvector: conn. from peer 3: 192.168.1.2  type: MCH
[0] Program is compiled to run on 4 nodes
[0] HETEROGENEOUS NETWORK SUPPORT
[0,SHM] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[0,SHM] ... * 5*5 groupings = 3276800 bytes/seg <= 4194304 maxbytes/seg
[0,SHM] is peer 0 of 1 in this group, with rep peernum = 0
[0,SHM] id 419462, 1 send boards at 10000, for target group rep'd by 1 .
[0,MC] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[0,MC] ... * 4*4 peers = 2097152 bytes/unit <= 100663296 maxbytes/unit
[0,MC] is peer 0 of 1 in this box, with rep peernum = 0
[1] _hpf_GetCommvector: conn. from peer 0: 127.0.0.1  type: SHM
[1] _hpf_GetCommvector: conn. from peer 1: 127.0.0.1  type: SHM
[1] _hpf_GetCommvector: conn. from peer 2: 192.168.1.2  type: MCH
[1] _hpf_GetCommvector: conn. from peer 3: 192.168.1.2  type: MCH
[1,SHM] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[1,SHM] ... * 5*5 groupings = 3276800 bytes/seg <= 4194304 maxbytes/seg
[1,SHM] is peer 0 of 1 in this group, with rep peernum = 1
[1,SHM] id 260, 1 send boards at 10000, for target group rep'd by 0 .
[1,MC] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[1,MC] ... * 4*4 peers = 2097152 bytes/unit <= 100663296 maxbytes/unit
[1,MC] is peer 0 of 1 in this box, with rep peernum = 1
[0->2,MC] key = 0x100007f0043c502, id = 0x51d, rings at 30000
[0->2,MC] ringnum = 0, ring at 30000, key = 0x100007f0043c502
[0->3,MC] key = 0x100007f0043c503, id = 0x52d, rings at 50000
[0->3,MC] ringnum = 0, ring at 50000, key = 0x100007f0043c503
[1->2,MC] key = 0x100007f0063a602, id = 0x53d, rings at 30000
[1->2,MC] ringnum = 0, ring at 30000, key = 0x100007f0063a602
[1->3,MC] key = 0x100007f0063a603, id = 0x54d, rings at 50000
[1->3,MC] ringnum = 0, ring at 50000, key = 0x100007f0063a603
[2] _hpf_GetCommvector: conn. from peer 0: 127.0.0.1  type: SHM
[3] _hpf_GetCommvector: conn. from peer 0: 127.0.0.1  type: SHM
[0<-1,SHM] id 260...
[1<-0,SHM] id 419462...
Peer [3] exited with status = 11
[3] _hpf_GetCommvector: conn. from peer 1: 127.0.0.1  type: SHM
[3] _hpf_GetCommvector: conn. from peer 2: 127.0.0.1  type: SHM
[3] _hpf_GetCommvector: conn. from peer 3: 127.0.0.1  type: SHM
[3,SHM] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[3,SHM] ... * 5*5 groupings = 3276800 bytes/seg <= 4194304 maxbytes/seg
[3,SHM] is peer 0 of 1 in this group, with rep peernum = 3
[3,SHM] id 3332, 1 send boards at 10000, for target group rep'd by 0 .
[3,SHM] id 3077, 1 send boards at 30000, for target group rep'd by 1 .
[3,SHM] id 7175, 1 send boards at 50000, for target group rep'd by 2 .
[3<-0,SHM] id 4441347...
 
ERROR Peer[3] code:0.11 _SHM_PostExchange: Failed to attach the shared memory ar
ea
      /tmp_mnt/usr/sde/disks/farm_build/src/hpf/lib/libhpf/msgshm.c Line:633 (er
rno:22)
Peer [2] exited with status = 11
[2] _hpf_GetCommvector: conn. from peer 1: 127.0.0.1  type: SHM
[2] _hpf_GetCommvector: conn. from peer 2: 127.0.0.1  type: SHM
[2] _hpf_GetCommvector: conn. from peer 3: 127.0.0.1  type: SHM
[2,SHM] 8 frames/ring * 16384 bytes/frame = 131072 bytes/ring ...
[2,SHM] ... * 5*5 groupings = 3276800 bytes/seg <= 4194304 maxbytes/seg
[2,SHM] is peer 0 of 1 in this group, with rep peernum = 2
[2,SHM] id 387, 1 send boards at 10000, for target group rep'd by 0 .
[2,SHM] id 1670, 1 send boards at 30000, for target group rep'd by 1 .
[2,SHM] id 4744, 1 send boards at 50000, for target group rep'd by 3 .
[2<-0,SHM] id 4441346...
 
ERROR Peer[2] code:0.11 _SHM_PostExchange: Failed to attach the shared memory ar
ea
      /tmp_mnt/usr/sde/disks/farm_build/src/hpf/lib/libhpf/msgshm.c Line:633 (er
rno:22)
appl3.hrz.uni-siegen.de,pi_example,19796 [3] exited with status=11 [appl3.hrz.un
i-siegen.de]
        System: 0.073200 seconds
        User:   0.060512 seconds
appl3.hrz.uni-siegen.de,pi_example,19796 [2] exited with status=11 [appl3.hrz.un
i-siegen.de]
        System: 0.084912 seconds
        User:   0.060512 seconds
[0<-1,SHM] boardnum = 0, board at 70000
[0->1,SHM] Getting send board.  myrep is 0, sibrep is 1.
[0->1,SHM] boardnum = 0, board at 10000
[0<->1,SHM] Rendezvous complete.
[0<-2,MC] getting key via 2.
[0<-2,MC] key = 0x183, to alloc 1 rings
[0<-2,MC] key = 0x183, id = 0x55d, rings at 90000
[0<-2,MC] ringnum = 0, recv ring at 90000 from rings at 90000
[0->2,MC] already setup.
[0->2,MC] ringnum = 0, send ring at 30000 from rings at 30000
[0<-3,MC] getting key via 3.
[0<-3,MC] key = 0xd04, to alloc 1 rings
[0<-3,MC] key = 0xd04, id = 0x56d, rings at b0000
[0<-3,MC] ringnum = 0, recv ring at b0000 from rings at b0000
[0->3,MC] already setup.
[0->3,MC] ringnum = 0, send ring at 50000 from rings at 50000
[1<-0,SHM] boardnum = 0, board at 70000
[1->0,SHM] Getting send board.  myrep is 1, sibrep is 0.
[1->0,SHM] boardnum = 0, board at 10000
[1<->0,SHM] Rendezvous complete.
[1<-2,MC] getting key via 2.
[1<-2,MC] key = 0x686, to alloc 1 rings
[1<-2,MC] key = 0x686, id = 0x57d, rings at 90000
[1<-2,MC] ringnum = 0, recv ring at 90000 from rings at 90000
[1->2,MC] already setup.
[1->2,MC] ringnum = 0, send ring at 30000 from rings at 30000
[1<-3,MC] getting key via 3.
[1<-3,MC] key = 0xc05, to alloc 1 rings
[1<-3,MC] key = 0xc05, id = 0x58d, rings at b0000
[1<-3,MC] ringnum = 0, recv ring at b0000 from rings at b0000
[1->3,MC] already setup.
[1->3,MC] ringnum = 0, send ring at 50000 from rings at 50000
appl3.hrz.uni-siegen.de>