[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1009.0. "anonymous sampling" by HERON::BUCHANAN (Andrew @vbo/dtn8285805/ARES,HERON) Fri Jan 06 1989 12:00

SCENE:	McDonalds (yes, even in Antibes, France)
TIME:   22:00 5 Jan

Me:  [ranting about doubling cube in backgammon].   [pause for breath]
Steve:	Well, I was never any good at statistics.
Me:  No? [munch on burger]
Steve:  But I remember my first statistics class:  the teacher showed us all
how statistics could be useful.   The guy asked the entire class some 
mildly incriminating yes/no question, like "have you ever smoked a funny
cigarette?" and of course people don't want to answer aloud.
	But he explained that if eveyone tossed a coin secretly, and then
told the truth if it was heads, and lied if it was tails, then he could
work out the percentage that *had* Indulged, without any *individual's*
history becoming public knowledge.
	Because all the lies cancel out you see.
Me: [munch]
Steve: [munch]
Me:  Hang on, there's something not quite right here.   Whatever the
audience, with your coin scheme, you'd expect half of them to say "YES".
So how can you tell anything at all?
Steve: Well it was something to do with a coin.
Me: Now if you had a dice, and on a 5 or 6 you lie, otherwise you tell
the truth, then your going to be able to get an estimate.   But now the
guy is revealing something about himself by his remark.
Steve:  No it definitely wasn't a dice.
Me:  Can I steal one of your french-fries?
Steve:  Did he divide the room in half, or something?
Me:  I need more ketchup.
Steve:  Gee, for ten years I'd thought that that was how statistics worked.
Me:  These French fries are pretty soggy.
Steve:  Well, I said I was no good at statistics.

	Any ideas, anyone?

Andrew.
T.RTitleUserPersonal
Name
DateLines
1009.1some variants workNIZIAK::YARBROUGHFri Jan 06 1989 13:268
    If the coins are unbiased you can't get any information out of a
    single flip and Head=true, Tail=false; you will tend to get exactly
    50% yes-no responses. If each sample is based on truth=TWO heads
    in 2 throws, false otherwise, you can begin to get something out.
    
    Alternatively, if only the guilty lie when their coin is tails you
    can get a significant result: the number of guilty will tend to
    twice the number of 'YES' responses.
1009.2Slight misunderstanding, I suspect.RDVAX::COOPERTopher CooperFri Jan 06 1989 14:4122
    This sounds like a minor misunderstanding of a previously obscure
    statistical survey technique which has in the last few years received
    a lot of attention because it has been used in surveying AIDS victims
    (and non-victims for control purposes).
    
    The technique is as your friend Steve described it but instead of
    lying if tails came up, the "subject" is instructed to then always
    give the "incriminating" answer (in this case, "Yes, I have smoked
    a 'funny' cigarette.").  The surveyer cannot tell if any incriminating
    answer is true or not, and what's more, unless the incidence of
    incriminating behavior is near 100%, it is much more likely that
    a specific incriminating response is due to the coin flip than to
    sanctioned behavior.
    
    If you survey 100 people and 72 of them give the sanctioned answer,
    then your estimate for the population proportion is (72-50)/50 =
    44%.  The cost is that you have to sample 100 people to get the
    same accuracy you would get from 50 people in a straight forward
    survey.  The benefit is that your 50 "effective people" are likely
    to be much more honest.
    
    					Topher
1009.3I *think* that these are blind herringsHERON::BUCHANANAndrew @vbo/dtn8285805/ARES,HERONFri Jan 06 1989 14:5150
1009.4Where does the apostrophe go in "Bayes Theorem"?AITG::DERAMODaniel V. {AITG,ZFC}:: D'EramoFri Jan 06 1989 14:5531
     Let p be the probability that the true answer is yes. 
     Suppose everyone flips a coin and tells the truth on heads,
     and lies on tails.  Suppose the probability of heads is q.
     
     Then the probability of a yes answer is:
     
          pq + (1-p)(1-q) = pq + 1 - p - q + pq = 1 - (p + q) + 2pq
     
     
     The probability of a no answer is:
     
          (1-p)q + p(1-q) = q - pq + p - pq = (p + q) - 2pq
     
     Do these add to one?  Yes.  :-)
     
     For a fair coin q = 1/2, and the probability of a yes answer
     becomes 1 - (p + q) + 2pq = 1/2 - p + p = 1/2.  So using
     q=1/2 gives no information about p.
     
     Suppose however one uses dice, and say, q = 1/3.  Then the
     probability of a yes answer is now 1 - (p + q) + 2pq =
     2/3 - p + (2/3)p = 2/3 - p/3 or (2 - p)/3.  Thus one can now
     get some information from the proportion of yes answers. 
     
     However, I bet that using Bayes Theorem will show that in
     either case (i.e., q = 1/2 or q not= 1/2) an individual's
     answer does reveal information about the individual (unless
     p = 1/2).  More later.
     
     Dan
     
1009.5oopsAITG::DERAMODaniel V. {AITG,ZFC}:: D'EramoFri Jan 06 1989 15:004
     .2 and .3 came in while I was replying; .3 already contains
     the "follow up" and the answer to the title of .4.
     
     Dan
1009.6.2 & .3 are malorderedHERON::BUCHANANAndrew @vbo/dtn8285805/ARES,HERONFri Jan 06 1989 15:4445
>    The technique is as your friend Steve described it but instead of
>    lying if tails came up, the "subject" is instructed to then always
>    give the "incriminating" answer (in this case, "Yes, I have smoked
>    a 'funny' cigarette.").  The surveyer cannot tell if any incriminating
>    answer is true or not, and what's more, unless the incidence of
>    incriminating behavior is near 100%, it is much more likely that
>    a specific incriminating response is due to the coin flip than to
>    sanctioned behavior.

	Yes, this has to be a valuable technique in practice.   But still
the guy who says 'yes' *may* be a smoker, whilst the guy who says 'no' 
*cannot* be.   With the figures you used above, 44 out of 72 are smokers.

	If one is a libertarian or paranoid person, one could imagine
that this would enable a government to 'home in' on a particular subset.
The question is: does there exist a technique where we can extract 
general information, without any loss of privacy for the individual?

	I had an idea...

	It's a slightly flippant idea, but it might be that it has a
serious application, in some different domain.

	(1) Divide the individuals into two classes, A & B.

	(2) Explain the question to those in in class A, and ask them
to reply.   Each can lie or tell the truth, as they please.

	(3) Ask those in class B to toss a coin each.   If heads, goto (4)
if tails goto (5).

	(4) Ask that person to tell the truth

	(5) Ask that person to lie or tell the truth, as they please.

Suppose that x of class A say "Yes" and y of class B.   Then how about
2*y-x as an estimate of the total number of smokers.   This assumes that
the members of class A and the members of class B would behave the same if
asked to say yes or no, as they please.   There may be a little care in
experimental design required to ensure that the members of class A are
in exactly the same state as class B.   E.g. get *everyone* to toss a coin,
and open one of two envelopes on that basis (both enevlopes contain the
same message for class A) then make a decision.

	Is this valid?
1009.7Not completelyRDVAX::COOPERTopher CooperFri Jan 06 1989 18:5866
RE: .6 (Andrew)

    > ... does there exist a technique where we can extract general
    > information, without any loss of privacy for the individual?

    In a word: no.  General information about any sampled group
    about sensitive subjects can be used to stigmatize members of that
    group.  If we discover that 80% (to make up a figure) of AIDS
    patients engage in socially unacceptable behavior, then we
    can conclude that any particular AIDS patient (whether or not
    they participated in the survey) probably engages in the
    unacceptable behavior.  And even if the survey results cannot
    be generalized, then it can still be used to stigmatize the
    individuals who participated in the survey (if 80% of the
    people who participated in the survey beat their spouses, then
    the survey can be used to label the people who took part as
    spouse-beaters).

    However, this does not rule out decreasing or eliminating the
    specifically personal risk of someone in the "tell the truth"
    group answering honestly with a truthfully "stigmatizable"
    response, or, for that matter, the risk to someone in the
    "always answer stigmatizable" group being lumped in (statistically)
    with the truthfully stigmatizable group.

    The method I described can be adjusted quite simply to reduce
    the risk to the individual to any desired degree.  Simply increase
    the relative size of the "always answer stigmatizable" to the
    desired level.  If the instructions are to answer truthfully only
    if two coin flips both come up heads, than a stigmatizable answer
    is even less likely to indicate stigmatizable behavior.  The
    cost is, of course, that larger and larger groups are needed for
    the same level of accuracy.

    The method assumes that one response is stigmatizable
    while the other would always be considered safe.  The method will
    not work if either response might be stigmatizable.  In that
    case the population should be divided by the initial coin-toss
    (more likely: die roll) into three groups: always answer A, always
    answer B and tell the truth.  The first two groups would be ideally
    equally proportioned, or, more sophisticatedly, proportioned according
    to the relative risk of the two answers (the original method is
    a specialization of this sophisticated proportioning).

    Note that if there is no risk associated with one of the answers
    then the 50% proportioning provides no additional protection to
    the honestly stigmatizable, but increases the risk of stigmatization
    to the honestly non-stigmatizable group.

    A variant of this new method, would be to use one coin flip to
    determine whether someone is in the "random answer" or "honest answer"
    groups, then a second coin flip to determine in the former case
    what the random answer should be.

    Unless I have missed something, this is essentially the method you
    have proposed, except the second coin flip is replaced with the
    subject's impulse as a randomizer, and group A has been added to
    estimate the characteristics of that randomizer.  I see no benefit
    to this, since it requires a much larger sample (to include group A),
    is less reliable (since our estimate of the proportions of each
    random answer is subject to sampling variation), and may deviate
    from the ideal proportions (to see this note that if all "random
    responders" are moved to make the same response then the method is
    the same as the original, except we have added group A).

				    Topher
1009.8KOBAL::GILBERTOwnership ObligatesMon Jan 09 1989 02:4212
Suppose we spin a roulette wheel, and use the table:
    
    	Black -> Answer "Yes"
    	Red   -> Answer "No"
        00    -> Tell the truth

Then with a large enough sample, we should get a significant result,
and knowing an individual's answer doesn't give enough information
to stigmatize him.


P.S.  We could just use secret ballots.  :^)
1009.9I guess they could wear a mask :-)RDVAX::COOPERTopher CooperMon Jan 09 1989 18:2118
RE: .8
    
    An excellent example of a device such as I was trying to describe
    (I probably should have included a concrete example such as this
    to clarify what I was saying.  Thanks).
    
    > P.S. We could just use secret ballots.  :^)
    
    Despite the smiley face it may be worthwhile mentioning the context
    which makes this technique a useful one.  Written questionaires
    tend to be biased in response and accuracy with respect to people
    who are partially or wholly illiterate in English.  A verbal interview
    allows the interviewer to assess "interactively" that the interviewee
    understands what is being asked and to take corrective action if
    not.  Complete annonymity then rests on trust of the interviewer
    and hence the problem.
    
    						Topher
1009.10there are three privacy concerns herePULSAR::WALLYWally Neilsen-SteinhardtWed Jan 18 1989 16:5132
    Note that .3 and .7 are raising privacy concerns that the test method,
    correctly described in .2, was not intended to address.
    
    The single concern motivating the test method as described was:
    suppose that I as a subject give an incriminating answer.  Could
    this be traced back to me as an individual and used to incriminate
    me?  The method described in .2 removes this concern, since there
    is no proof that the incriminating answer is true.
    
    .3 raises a second concern: that an incriminating answer may raise
    the subjective probability that the incriminating answer is true
    for the individual.  As discussed elsewhere, particularly in .3
    and .4, this change to subjective probability can be minimized but
    not eliminated.  I personally would argue that this second concern
    is less significant than the first, since this subjective probability
    is never admissible as evidence in a criminal case and seldom ina
    civil case.  But other people have other standards of privacy.
    
    .7 raises a third concern: that survey results may be used to
    stigmatize a particular group.  However, the connection between
    group characteristics is often what is being sought by the survey.
    This is a conflict in goals which no test design can eliminate.
    For (a controversial) example: suppose a public health agency wants
    to test the hypothesis that promiscuous homosexuals have an increased
    risk of having AIDS.  The agency says it needs the information to
    design a prevention campaign.  An advocacy group says it will be used 
    to inflame public opinion against promiscuous homosexuals.  Any
    test design which satisfies the agency will be objectionable to
    the group, and vice versa.  The real issue here is the relative
    merits of the two arguments, not the test design.  There is no general
    answer here, since for most of us, changing the details of the
    situation will change the side we favor.