Welcome to Science with Shrike! We continue our discussion on masks with part II, aka the juicy stuff. Today we will discuss how to test masks at the physical, biological and epidemiological level. We will end by reviewing the clinical trials testing mask efficacy.
Source Control?
Most of these devices are designed to protect the wearer, not the environment. Purifying the air as it leaves from a potentially infectious person is often not practical for several reasons. First, it is hard to determine who is infectious and who is not. Second, many of the filtration solutions vent the air back into the environment, so they do not provide protection. Last, in the source control scenario, your safety depends on another person’s compliance instead of yours.
To detoxify the environment, ventilation is a more robust solution. Note that places with particulates as an occupational hazard encourage both good ventilation paired with an appropriate respirator.
How do we know if masks work?
Testing mask efficacy against respiratory pathogens is not a simple task. There are several layers to try to test the masks. In the lab environment, there are a few different options:
The physicists hook up a mask to a blower and measure how many particles may be forced through the mask and their size distribution. This is a terrible experiment because the fit between the mask and the blower will not be the same as the fit between mask and person. The blower setting may fail to model human breathing, coughing and sneezing. This also measures particles, which are usually non-infectious simulants. This means you cannot tell if the particles getting through are infectious or not, or if the dose is still high enough to infect a person.
The physicists with cooler toys stick fluorescein up people’s noses and then put on a laser light show to visualize all the fluorescent particles coming out. While this looks cool, and can show how larger particles move about, there remain problems. First, the smallest particles are the hardest to detect because they have the least fluorescence. Shrike does not believe the authors’ claims that their systems can detect droplet nuclei. So this tells you more about respiratory droplets than aerosols. This also doesn’t tell you which particles are most infectious, or if the dose is high enough to infect a person. The physics can look cool, but the key biology is missing from these experiments.
Biologists can check the infectious dose of particulates in several ways. The cheapest way is to use a permissive cell line. These cells, often African Green Monkey kidney cells, human embryonic kidney cells, or human cervical cancer cells, are grown in many small wells. The viral sample in question is diluted and added to the cells. Viral replication can be measured by molecular methods (real-time PCR), or by old school plaque assays. In plaque assays, viral spread is limited in the well by adding a gel (agarose) to the cells. This forces new viruses to infect neighboring cells. After a few rounds of infection, each initial virion leaves a small circle of destruction, called a plaque. Counting the plaques gives an idea of how many virions were in the initial sample. This also allows determination of the ID50 in vitro. Since this ID50 may be different than the ID50 in humans, it is often called “tissue culture ID50.” This allows some measurement of the infectiousness of a given sample, and comparison between samples.
Biologists with cooler toys can use an animal model. The advantage of animal models is that you can enclose the animal in a box and control the size and density of the particulates. Intranasal delivery will provide large droplets, but nebulizers can give a consistent droplet size. This enables dose-response testing, determination of the in vivo ID50, and impact of particulate size on ID50. Animals can be sampled to determine the severity of infection. Importantly, this will measure infection. The con is that animal models have physiologic differences from humans, and the findings might not translate to humans. Also, some viruses do not infect animals well, so you need to either make a transgenic mouse, or adapt the virus to the animal. In the case of SARS-CoV2, transgenic mice that express the human protein ACE2 are needed. In the case of influenza, the virus was adapted to mice. In both cases, mice do not sneeze, which is why ferrets are a common animal model for respiratory illness. Good luck testing a mask on an animal.
Biologists in countries like Korea with access to a spare closet capable of good ventilation can leave a plate open near a patient to collect virus, and allow them to breathe with and without the mask on. However, it is important to vent the room between mask and no mask, and try to limit other variables. Analysis of the collected virus needs to be done quickly, too.
Biologists with clinician friends and fancy toys can do some human measurements. Enter the Gesundheit II (GSII). This device allows the collection of everything a human is breathing. Even better, the machine can separate the aerosols from the respiratory droplets and collect both. These materials can be used in downstream analyses, either to measure total virus (by real-time PCR), or infectious virus (by tissue culture ID50). The downside to the GSII is that you need the person being tested to sit there for half an hour to collect enough sample. If you’re trying to test an intervention with the same person, this now costs them an hour. That’s why this respiratory illness study only had 49 of 246 participants provide paired samples.
For SARS-CoV2, this becomes a logistic challenge. Most infected people do not spread SARS-CoV2 well, so you need to collect samples from a large number of people to test masking. If you can process 6-8 people per day on one machine, you can collect from 30-40 people per week. And then run all the analyses on those samples. This is within the capacity of a large multi-center study (especially if NIH were helping to fund this), but most individual labs lack the GS-II, the human research approval, and the personnel for human subjects plus analysis.
Existing Trials
Prior to 2020, the risk from pathogens carried on respiratory droplets was reduced by covering your mouth when you sneezed or coughed. This caught the majority of the particles expelled, and prevented them from finding other people. Since hands can contaminate other objects if you don’t wash your hands after sneezing into them, sneezing or coughing into your shoulder became the preferred method. “Cover your cough” is a free method to which everyone has access. While signage to improve compliance used to live in every physician’s office, it was quietly retired during the 2020 SARS-CoV2 pandemic. To Shrike’s knowledge, “cover your cough” has never been compared in trials to surgical mask use. “Cover your cough” is the proper ‘standard of care’ that should be the baseline, along with a ‘no intervention’ control.
The use of masks to prevent respiratory infections has been tested. Prior to 2020, influenza was the major respiratory pathogen tested in trials. If masks help, it would reduce flu infections and deaths. The results were inconclusive. There was a ~2-3 fold reduction in infectious particles with N95s, but no significant reduction in transmission occurred. As a result, masks were NOT recommended for regular use.
SARS-CoV2 changed all of this. On one hand, coronaviruses are spread different compared to influenza. Initial thoughts were that it was spread by primarily respiratory droplets, and this idea persisted after the Diamond Princess cruise ship transmission falsified that hypothesis. On the Diamond Princess, spread appeared to occur via air ducts to customers who isolated, which means it can be spread via aerosols. However, it’s possible that masks could work for one respiratory virus spread by aerosols, even if it has failed for all of the others.
Rigorously studying infection in humans is challenging due to ethics, and all the variables we cannot control, so we try to cheat and get hints with other methods. These tend to be “case reports”, “case control studies”, and “retrospective studies”. A case report is reporting on one specific case and what was found. It’s an n of 1 study. For new, rare, and weird things, these case reports are valuable and give initial information. However, for testing drug efficacy, they are limited in value.
In a case control study, you select a lot of people who get COVID19 (the “cases”), and compare them to people who do not get COVID (the “controls”). You attempt to select the controls to be as similar to the cases as possible. Then you see which differences you were unable to eliminate. These differences might be important for disease transmission/pathogenesis. One of the most prolific versions of these are the country charts, where COVID infections/deaths/whatever are broken down by country. These are useless examples because they rarely control for any of the other variables at play. Even when you try to control for variables, it is easy to miss a key variable.
In a retrospective study, you look through medical records of people who had COVID and compare them to people who did not, and try to pull out differences.
All of these study approaches have their place in helping us better understand how diseases work, and factors associated with disease. However. They are exploratory, hypothesis-generating studies, not hypothesis-testing studies. Hypotheses must be tested, not assumed. This is why Shrike is not discussing any of the various case control, retrospective, etc trials about masks.
The best solution we have are randomized controlled trials (RCTs). These trials are an attempt to limit the variables and test an intervention directly by randomizing people who get a treatment and who get the placebo. This is the best way to test, but there are two big challenges. First is data fakery. The easy rule of thumb to avoid data fakery is to remain agnostic on RCTs performed outside of the West (ie outside US, Australia, Canada, UK, western Europe), with a possible exception for Japan. There are ok RCTs done outside of those countries, but the scam rate is much higher.
The second challenge is getting enough patients enrolled. If the infection rate for a study population is ~2%, you will need to enroll thousands of patients for statistically significant data, especially if the differences are small. In 1000 patients per group, a 10% reduction in the attack rate would be the difference between 20 patients and 18. Is that a real difference? Hard to say. This is one reason why the clinical trials are hard to come by. The counterpoint is that the government had the resources to rigorously test mask use, and they did not.
So what did the clinical trials say, and how many were performed? You can see all current clinical trials at Clinicaltrials.gov. This Cochrane review gives the tl;dr on 78 trials (including flu and other respiratory infections): not clear, with a notable paucity of trials.
One of the first reported trials for SARS-CoV2 was the DanMask study (NCT04337541). This looked at mask use in the community. Face masks failed to offer protection, but the error bars were high. This means they could not rule out small contributions from masks (e.g. the study could not rule out masks cutting infection risk by 40% or increasing infection risk by 20%).
A larger scale study (340,000 people from 600 villages) was published from work in Bangladesh (NCT04630054). They observed an 11% reduction in COVID infections between their intervention and control villages, despite ~40% compliance in face mask use in their intervention villages. A key weakness in this trial was that the intervention was not restricted to face mask use—it was a kitchen sink approach of public propaganda, and included social distancing and other approaches. So it’s hard to attribute the reduction to any specific intervention. The only take-away is that all of the interventions collectively cut infections by 11%. Not a win for masks in Shrike’s book.
Trial NCT04296643 was published, and compared masks to N95 to prevent infection, but failed to compare to ‘no masks’ (not allowed), or ‘cover your cough’. They found no difference in transmission between health care workers wearing surgical masks or N95. However, they needed Egypt and Pakistan to get the n higher, and there was a potential difference in the Canadians.
Trial NCT04647305 was published and compared the addition of a face shield to a mask. Maybe it helped, but the difference was 1 infected person vs 3. This illustrates the challenge of needing to enroll thousands of people instead of hundreds.
Trials to watch:
One trial in process to watch is NCT05690516, which is testing in Norway. The downside is that the outcome measure is self-reported illness. Another one that just wrapped up is NCT04979858 at Georgia Tech. This one compared a new mask type to baseline. The downsides to this trial is that subclinical infections are not tested, and the control group was ‘masking practice of choice’ instead of ‘cover your cough’.
Designing a trial
Shrike would like to see two studies performed to test mask efficacy. First, would be a multi-center trial using the GSII, similar to this study. Key improvements would be 1) measuring infectious virus, not just PCR, and getting a higher n on patients willing to give both masked and non-masked breath samples such that at least 30-50 samples in the non-masked breath samples are positive for infectious SARS-CoV2. Neither of these requests are trivial, and are likely beyond the ability (and funding) of most PIs to organize and test on their own.
But this is not impossible for the NIH to pull off. This could have been organized at the federal level, and institutions paid to conduct the study in major population centers. Especially if the GSII, money for analysis and IRB approval was all provided, PIs would fall over themselves to get involved.
For the clinical trials, Shrike would like to see ‘cover your cough’ included as a control group, put up against standard face masks, without social distancing, and against a PAPR. The primary endpoints would be weekly tests for spike and nucleocapsid antibody titers, and PCR for virus, too. Spike to check vaccination status/titer by variant, and nucleocapsid to check infection. The PCR would be nice to confirm infection. The secondary endpoint would be hospitalizations. The challenge with this approach is the n needs to be enormous, especially if the time frame for follow-up is short (better participation with short time frame). Even then, it would take months and require testing tens of thousands of people in each group. This would require a public health contribution. But 340,000 people were assessed in Bangladesh, so it is possible.
While the logistics are not trivial, they are doable for the government. If masking is important, it should be tested rigorously, much as hydroxychloroquine and ivermectin were. Count the trials on ClinicalTrials.gov that used either of those for SARS-CoV2 and then compare to the number testing masking. While it is easier to test drugs, no one mandated either of those drugs. Masks were mandated without robust clinical trial data supporting their use in preventing SARS-CoV2 … or influenza, seasonal coronaviruses or other respiratory viruses, for that matter.
Instead of RCTs, we got a dog and pony show explaining why they are hard to do with masks. Yes, they’re hard to do, but not impossible. Quite doable (and ethically necessary) for a government that wants to mandate their use.
This is yet again a triumph of the policy maker over the scientist.