    Thought this would answer some questions folks have posed about the reliability of polls given cell phone usage. It's from July but the points are still valid for this fall.
    Tuesday, July 22, 2008

    The Cellphone Problem, Revisited

    Let me comment on a bit more length on the so-called "cellphone problem" -- the fact that many voters are unreachable to pollsters whose samples consist of landline numbers only. This may have some relevance in explaining the Rasmussen results today in Ohio which showed John McCain with a fairly large lead.

    The basic issue with cellphone-only households is that their incidence is not distributed evenly throughout the population. Minorities are more likely to be cellphone-only than whites, and men are more likely to be cellphone-only than women. But the most important differences are in terms of the age of the voter.

    The below is data compiled by the Centers for Disease Control on the number of cellphone-only adults by age cohort. Actually, it is not just cellphone-only adults -- the CDC also tracks another category which I call "cellphone-mostly" adults. These are people that have a landline, but also have a mobile phone, and use their mobile phone to receive most or all of their calls. I know, personally, a lot of people who fall into this category: they may use their landlines only to make local calls, only to connect to the Internet, only as an emergency in case their cellphone service is down, and they may have the service only because it came bundled with their cable or wireless package. If their friends and family are in the habit of calling them on their cellphones, they may be very suspicious of calls coming into their landlines -- assuming that they are likely to be from telemarketers -- and not make a practice of answering them.

    Table 1. Cellphone-Only and Cellphone-Mostly Adults by Age Cohort

    As you can see, fully half of all adults under the age of 30 fall into the cellphone-only or cellphone-mostly buckets, and the number is growing every day. About a third of adults aged 30-44 are cellphone-only or cellphone-mostly, and then the numbers trail off once adults pass the midpoint of their lives.

    Obviously, if polling firms did not weight by age, this would be an utter disaster for any election in which preferences vary significantly by age. Suppose for example that the following represented the true distribution of the likely voter population in Big Industrial State:

    Age %/LV Obama McCain
    18-24 10 69 31
    25-29 10 60 40
    30-44 30 50 50
    45-64 35 46 54
    65+ 15 40 60
    TOTAL 100 50 50

    These numbers have been 'rigged' such that each of Obama and McCain receive exactly 50 percent of the vote. Suppose, however, that we exclude cellphone-only and cellphone-mostly voters from our sample, according to their proportions in the CDC data. What you'd instead wind up with is the following:

    Age %/LV Obama McCain
    18-24 7 69 31
    25-29 6 60 40
    30-44 28 50 50
    45-64 39 46 54
    65+ 20 40 60
    TOTAL 100 48.5 51.5

    What ought to have been a tie instead turns into a 3-point lead for John McCain. (And keep in mind that the numbers in this example are hypothetical -- but they probably look something like this).

    Pollsters can get around this problem by weighting groups that are likely to be cellphone-only more heavily -- in particular younger voters. This is what nearly all smart pollsters do, and it is considerably better than the alternative of not weighting at all. However, it creates a couple of additional problems.

    The first and more commonly-discussed problem is that the cellphone-only voters may not be the same as their landline counterparts, even once we control for age and other variables like race and gender. Urban voters are about 50 percent more likely to be cellphone-only than rural voters, for instance, and while some pollsters weight by geography, others do not. Thus, you may wind up with a biased sample.

    But even if the sample were unbiased -- the pollster is smart enough to figure out how to balance all the weights properly -- what you're still doing in effect is to magnify the importance of sampling error. Suppose that a pollster wants to sample 500 likely voters in a state. Roughly speaking, about 20 percent of these -- 100 of them -- are likely to fall into the 18-29 age range. But, about half of those voters can't be reached because they are cellphone-only or cellphone-mostly. So your effective sample size for this subgroup is 50 voters, which carries a margin of error of +/- 14 points. Sometimes, the luck of the draw will come through for you and you'll wind up with a pretty good sample, but other times you'll be pretty far off.

    If you are not fortunate enough to wind up with a good sample, what you are going to wind up doing is compounding your problems, because you have to weight all the young voters that you do sample more heavily to make up for the ones that you can't reach because they depend on cellphones.

    So what you should get in the habit of doing, where such information is available, is to check the cross-tabs for groups that are known to have problems with non-response bias -- by which I mean check them for younger voters because of the cellphone-only problem. If the pollster was unlucky and wound up with a poorly-representative sample of such voters, it may skew their overall results, as such responses wind up being weighted more heavily.

    Is this an issue with the Rasmussen poll in Ohio? Actually, it may be. The poll has McCain leading 50-39 among voters aged 18-29, and 67-33 among voters aged 30-39. Obama leads 55-36 among voters in their 40s, and then McCain leads by single-digit margins among voters aged 50 and up. Such an age distribution is inconsistent with most other polling that we have seen in this election.

    This does not mean that Rasmussen screwed up. This problem has nothing to do with Rasmussen; it is common to all pollsters that don't include a cellphone supplement, which means all pollsters except Gallup and Selzer. These pollsters are trying to do everything they can to work around a vexing problem -- that about half the young voters they might want to sample can't be reached, and that they are stuck with small sample sizes of such voters as a result. But it does mean that, if there is greater error in their sample of young voters, it will lead to greater error in their poll as a whole. Electoral Projections Done Right: The Cellphone Problem, Revisited

    Here's some new info on cellphones being used in polling:
    Estimating the Cellphone Effect: 2.8 2.2 Points

    Mark Blumenthal has a rundown of the pollsters that are including cellphone numbers in their samples. Apparently, Pew, Gallup, USA Today/Gallup (which I consider a separate survey), CBS/NYT and Time/SRBI have been polling cellphones all year. NBC/WSJ, ABC/Washington Post and the AP/GfK poll have also recently initiated the practice. So too does the Field Poll in California, PPIC, also based in California, and Ann Selzer. There may be some others too but those are the ones that I am aware of. (EDIT: The director of the PPIC survey in California has kindly written to let me know that, while they use a cellphone supplement for some of their public policy surveys, they have not done so thus far this year for their Presidential trial heats. The remainder of this article has been corrected accordingly.

    Let's look at the house effects for these polls -- that is, how much the polls have tended to lean toward one candidate or another. These are fairly straightforward to calculate, via the process described here. Essentially, we take the average result from the poll and compare it to other polls of that state (treating the US as a 'state') after adjusting the result based on the national trendline.

    Since ABC, NBC/WSJ and AP/GfK all just recently began using cellphones, we will ignore their data for now. We will also throw out the data from three Internet-based pollsters, Zogby Interactive, Economist/YouGov, and Harris Interactive. This leaves us with a control group of 36 37 pollsters that have conducted at least three general election polls this year, either at the state or national level.

    Pollster n Lean
    ========= ====
    Selzer 5 D +7.8
    CBS/NYT 14 D +3.7
    Pew 7 D +3.4
    Field Poll 4 D +2.8
    Time/SRBI 3 D +2.4
    USA Today/Gallup 11 D +0.4

    Gallup 184 R +0.6
    PPIC 4 R +1.3

    AVERAGE D +2.8 +2.3

    CONTROL GROUP (37 Pollsters) D +0.0 +0.1Six of the seven eight cellphone-friendly pollsters have had a Democratic (Obama) lean, and in several cases it has been substantial. On average, they had a house effect of Obama +2.8 +2.3. By comparison, the control group had essentially zero house effect a house effect of Obama +0.1 (**), so this would imply that including a cellphone sample improves Obama's numbers by 2.8 points. (Or, framed more properly, failing to include cellphones hurts Obama's numbers by approximately 2 2-3 points).

    The difference is statistically significant at the 95 percent confidence level. Perhaps not coincidentally, Gallup, Pew and ABC/WaPo have each found a cellphone effect of between 1-3 points when they have conducted experiments involving polling with and without a cellphone supplement.

    A difference of 2-3 points may not be a big deal in certain survey applications such as market research, but in polling a tight presidential race it makes a big difference. If I re-run today's numbers but add 2.2 points to Obama's margin in each non-cellphone poll, his win percentage shoots up from 71.5 percent to 78.5 percent, and he goes from 303.1 electoral votes to 318.5 (EDIT: I have not changed this part of the analysis in reflection of the new numbers, as it should still get the general point across). The difference would be more pronounced still if Obama hadn't already moved ahead of McCain by a decent margin on our projections.

    So this is my plea to pollsters: let's get it right. Perhaps the cellphone effect will prove to be a mirage after all, but that's something for the data to determine on its own, rather than the pollster.

    (**) Keen observers will wonder why the average house effect is greater than zero. This is because in determining our house effect coefficients, we weight based on how many polls each pollster has conducted. A couple of pollsters that account for a large proportion of our data, like Rasmussen and ARG, have had slight (very slight, but enough to skew the numbers) GOP leans. Electoral Projections Done Right: <img src="" /> Estimating the Cellphone Effect: 2.8 <strike>2.2</strike> Points

    Another update:
    Selzer & Co: Correction on Cellphones

    In a post on Saturday, I asserted that Ann Selzer's polling company, Selzer & Co., which conducts polling in Iowa, Indiana and Michigan, includes a supplement of cellphone voters in their interviews.

    Selzer contacted me to let me know that this is NOT the case. They have been using random digit dialing of landlines (as almost all pollsters do) for their general election polling, and have not been calling a supplementary sample of cellphone numbers.

    The reason I got the idea that they had been calling cellphones was because of this article at in October, in which Mark Blumenthal wrote that:
    "Selzer also informs us via email that their completed interviews included a small number of voters interviewed on their cell phones."
    There WERE a small percentage of cellphone numbers included in Selzer's Iowa caucus polling, because these polls were based on registered voter lists purchased from the state of Iowa, and some voters had left their cellphone number as their primary point of contact on those lists. So while there was no deliberate attempt to reach cellphone voters, she was able to reach some anyway via this list-based sampling. For their general election polling, however -- in Iowa as well as Indiana and Michigan -- Selzer is using random digit dailing of landlines, and so no cellphones will be included.

    With the Selzer polls taken out of the sample, and refreshing my original analysis based on the most current data, I now show a cellphone effect of 2.3 points rather than 2.8 points. The cellphone variable remains statistically meaningful at the 95 percent significance threshold.

    Yesterday, the Pew Research Center issued a detailed study on cellphones and the 2008 election. They found that Barack Obama performed a net 2-3 points better between three of their recent polls when a cellphone sample was included. They also found that cellphone-only voters were significantly more likely to support Obama than non-cellphone voters of the same age.

    I apologize to Mrs. Selzer as well as our readers for any confusion. We will have another post up momentarily with some additional thoughts based on my conversation with Mrs. Selzer. Electoral Projections Done Right: Selzer & Co: Correction on Cellphones

