Instead of trying to work through these issues at the national level, the sequencing contracts allow individual public health agencies to request the names and contact information of people who have tested positive for variants of concern. But that just pushes the same problems of data ownership down the chain.
“Some states are very good and want to know a lot about variants that are circulating in their state,” says LabCorp’s Brian Krueger. “The other states are not.”
Public health epidemiologists often have little experience with bioinformatics, using software to analyze large datasets like genomic sequences. Only a few agencies have preexisting sequencing programs; even if they did, having each jurisdiction analyze just a small slice of the data set undercuts how much knowledge can be gleaned about real-world behavior.
Getting around those issues—making it easier to connect sequences and clinical metadata on a large scale—would require more than just root-and-branch reform of privacy regulations, however. It would mean a reorganization of the entire health care system in the US, where each of the 64 public health agencies operate as fiefdoms, and there is no centralization of information or power.
“Metadata is the single biggest uncracked nut,” says Jonathan Quick, managing director of pandemic response, preparedness, and prevention at the Rockefeller Foundation. (The Rockefeller Foundation helps fund coverage at MIT Technology Review, although it has no editorial oversight.) Because it’s so hard for public health experts to put together big enough data sets to really understand real-world variant behavior, our understanding has to come from vaccine manufacturers and hospitals that add sequencing to their own clinical trials, he says.
It’s frustrating to him that so many huge data sets of useful information already exist in electronic medical records, immunization registries, and other sources but can’t easily be used.
“There’s a whole lot more that could be learned, and learned faster, without the shackles we put on the use of that data,” says Quick. “We can’t just rely on the vaccine companies to do surveillance.”
Boosting state-level bioinformatics
If public health labs are expected to focus more on tracking and understanding variants on their own, they’ll need all the help they can get. Doing something about variants case by case, after all, is a public health job, while doing something about variants on a policy level is a political one.
Public health labs generally use genomics to expose otherwise hidden information about outbreaks, or as part of track-and-trace efforts. In the past, sequencing has been used to connect E. coli outbreaks to specific farms, identify and interrupt chains of HIV transmission, isolate US Ebola cases, and follow annual flu patterns.
Even those with well-established programs tend to use genomics sparingly. The cost of sequencing has dropped dramatically over the last decade, but the process is still not cheap, particularly for cash-strapped state and local health departments. The machines themselves cost hundreds of thousands of dollars to buy, and more to run: Illumina, one of the biggest makers of sequencing equipment, says labs spend an average of $1.2 million annually on supplies for each of its machines.
Health agencies don’t just need money; they also need expertise. Surveillance requires highly trained bioinformaticians to turn a sequence’s long strings of letters into useful information, as well as people to explain the results to officials, and convince them to turn any lessons learned into policy.
Fortunately, the OAMD has been working to support state and local health departments as they try to understand their sequencing data, employing regional bioinformaticians to consult with public health officers and facilitating agencies’ efforts to share their experiences.
It is also pouring hundreds of millions into building and supporting those agencies’ own sequencing programs—not just for covid, but for all pathogens.
But many of those agencies are facing pressure to sequence as many covid genomes as possible. Without a cohesive strategy for collecting and analyzing data, it’s unclear how much utility those programs will have.
“We’ll miss a ton of opportunities if we just give health departments money to set up programs without having a federal strategy so that everyone knows what they’re doing,” says Warmbrod.
Initial visions, usurped
Mark Pandori is director of the Nevada state public health laboratory, one of the programs OAMD supports. He has been a strong proponent of genomic surveillance for years. Before moving to Reno, he ran the public health lab in Alameda County, California, where he helped pioneer a program using sequencing to track how infections were being passed around hospitals.
Turning sequences into usable data is the biggest challenge for public health genomics programs, he says.
“The CDC can say, ‘Go buy a bunch of sequencing equipment, do a whole bunch of sequencing.’ But it doesn’t do anything unless the consumers of that data know how to use it, and know how to apply it,” he says. “I’m talking to you about the robotics we need to get things sequenced every day, but health departments just need a simple way to know if cases are related.”
When it comes to variants, public health labs are under many of the same pressures the CDC faces: everyone wants to know what variants are circulating, whether or not they can do anything with the information.
Pandori launched his covid sequencing program hoping to cut down on the labor needed to investigate potential covid outbreaks, quickly identifying whether cases caught near each other were related or coincidental.
His lab was the first in North America to identify a patient reinfected with covid-19, and later found the B.1.351 variant in a hospitalized man who had just come back from South Africa. With rapid contact tracing, the health department was able to prevent it from spreading.
The pandemic slashed the West Coast’s emissions. Wildfires already reversed it.
That’s far above normal levels for this part of the year and comes on top of the surge of emissions from the massive fires across the American West in 2020. California fires alone produced more than 100 million tons of carbon dioxide last year, which was already enough to more than cancel out the broader region’s annual emissions declines.
“The steady but slow reductions in [greenhouse gases] pale in comparison to those from wildfire,” says Oriana Chegwidden, a climate scientist at CarbonPlan.
Massive wildfires burning across millions of acres in Siberia are also clogging the skies across eastern Russia and releasing tens of millions of tons of emissions, Copernicus reported earlier this month.
Fires and forest emissions are only expected to increase across many regions of the world as climate change accelerates in the coming decades, creating the hot and often dry conditions that turn trees and plants into tinder.
Fire risk—defined as the chance that an area will experience a moderate- to high-severity fire in any given year—could quadruple across the US by 2090, even under scenarios where emissions decline significantly in the coming decades, according to a recent study by researchers at the University of Utah and CarbonPlan. With unchecked emissions, US fire risk could be 14 times higher near the end of the century.
Emissions from fires are “already bad and only going to get worse,” says Chegwidden, one of the study’s lead authors.
Over longer periods, the emissions and climate impacts of increasing wildfires will depend on how rapidly forests grow back and draw carbon back down—or whether they do at all. That, in turn, depends on the dominant trees, the severity of the fires, and how much local climate conditions have changed since that forest took root.
While working toward her doctorate in the early 2010s, Camille Stevens-Rumann spent summer and spring months trekking through alpine forests in Idaho’s Frank Church–River of No Return Wilderness, studying the aftermath of fires.
She noted where and when conifer forests began to return, where they didn’t, and where opportunistic invasive species like cheatgrass took over the landscape.
In a 2018 study in Ecology Letters, she and her coauthors concluded that trees that burned down across the Rocky Mountains have had far more trouble growing back this century, as the region has grown hotter and drier, than during the end of the last one. Dry conifer forests that had already teetered on the edge of survivable conditions were far more likely to simply convert to grass and shrublands, which generally absorb and store much less carbon.
This can be healthy up to a point, creating fire breaks that reduce the damage of future fires, says Stevens-Rumann, an assistant professor of forest and rangeland stewardship at Colorado State University. It can also help to make up a bit for the US’s history of aggressively putting out fires, which has allowed fuel to build up in many forests, also increasing the odds of major blazes when they do ignite.
But their findings are “very ominous” given the massive fires we’re already seeing and the projections for increasingly hot, dry conditions across the American West, she says.
Other studies have noted that these pressures could begin to fundamentally transform western US forests in the coming decades, damaging or destroying sources of biodiversity, water, wildlife habitat, and carbon storage.
Fires, droughts, insect infestations, and shifting climate conditions will convert major parts of California’s forests into shrublands, according to a modeling study published in AGU Advances last week. Tree losses could be particularly steep in the dense Douglas fir and coastal redwood forests along the Northern California coast and in the foothills of the Sierra Nevada range.
All told, the state will lose around 9% of the carbon stored in trees and plants aboveground by the end of this century under a scenario in which we stabilize emissions this century, and more than 16% in a future world where they continue to rise.
Among other impacts, that will clearly complicate the state’s reliance on its lands to capture and store carbon through its forestry offsets program and other climate efforts, the study notes. California is striving to become carbon neutral by 2045.
Meanwhile, medium- to high-emissions scenarios create “a real likelihood of Yellowstone’s forests being converted to non-forest vegetation during the mid-21st century,” because increasingly common and large fires would make it more and more difficult for trees to grow back, a 2011 study in Proceedings of the National Academy of Sciences concluded.
The global picture
The net effect of climate change on fires, and fires on climate change, is much more complicated globally.
Fires contribute directly to climate change by releasing emissions from trees as well as the rich carbon stored in soils and peatlands. They can also produce black carbon that may eventually settle on glaciers and ice sheets, where it absorbs heat. That accelerates the loss of ice and the rise of ocean levels.
But fires can drive negative climate feedback as well. The smoke from Western wildfires that reached the East Coast in recent days, while terrible for human health, carries aerosols that reflect some level of heat back into space. Similarly, fires in boreal forests in Canada, Alaska, and Russia can open up space for snow that’s far more reflective than the forests they replaced, offsetting the heating effect of the emissions released.
Different parts of the globe are also pushing and pulling in different ways.
Climate change is making wildfires worse in most forested areas of the globe, says James Randerson, a professor of earth system science at the University of California, Irvine, and a coauthor of the AGU paper.
But the total area burned by fires worldwide is actually going down, primarily thanks to decreases across the savannas and grasslands of the tropics. Among other factors, sprawling farms and roads are fragmenting the landscape in developing parts of Africa, Asia, and South America, acting as breaks for these fires. Meanwhile, growing herds of livestock are gobbling up fuels.
Overall, global emissions from fires stand at about a fifth the levels from fossil fuels, though they’re not rising sharply as yet. But total emissions from forests have clearly been climbing when you include fires, deforestation and logging. They’ve grown from less than 5 billion tons in 2001 to more than 10 billion in 2019, according to a Nature Climate Change paper in January.
Less fuel to burn
As warming continues in the decades ahead, climate change itself will affect different areas in different ways. While many regions will become hotter, drier, and more susceptible to wildfires, some cooler parts of the globe will become more hospitable to forest growth, like the high reaches of tall mountains and parts of the Arctic tundra, Randerson says.
Global warming could also reach a point where it actually starts to reduce certain risks as well. If Yellowstone, California’s Sierra Nevada, and other areas lose big portions of their forests, as studies have suggested, fires in those areas could begin to tick back down toward the end of the century. That’s because there’ll simply be less, or less flammable, fuel to burn.
Worldwide fire levels in the future will ultimately depend both on the rate of climate change as well as human activity, which is the main source of ignitions, says Doug Morton, chief of the biospheric sciences laboratory at NASA’s Goddard Space Flight Center.
Meet the people who warn the world about new covid variants
In March 2020, when the WHO declared a pandemic, the public sequence database GISAID held 524 covid sequences. Over the next month scientists uploaded 6,000 more. By the end of May, the total was over 35,000. (In contrast, global scientists added 40,000 flu sequences to GISAID in all of 2019.)
“Without a name, forget about it—we cannot understand what other people are saying,” says Anderson Brito, a postdoc in genomic epidemiology at the Yale School of Public Health, who contributes to the Pango effort.
As the number of covid sequences spiraled, researchers trying to study them were forced to create entirely new infrastructure and standards on the fly. A universal naming system has been one of the most important elements of this effort: without it, scientists would struggle to talk to each other about how the virus’s descendants are traveling and changing—either to flag up a question or, even more critically, to sound the alarm.
Where Pango came from
In April 2020, a handful of prominent virologists in the UK and Australia proposed a system of letters and numbers for naming lineages, or new branches, of the covid family. It had a logic, and a hierarchy, even though the names it generated—like B.1.1.7—were a bit of a mouthful.
One of the authors on the paper was Áine O’Toole, a PhD candidate at the University of Edinburgh. Soon she’d become the primary person actually doing that sorting and classifying, eventually combing through hundreds of thousands of sequences by hand.
She says: “Very early on, it was just who was available to curate the sequences. That ended up being my job for a good bit. I guess I never understood quite the scale we were going to get to.”
She quickly set about building software to assign new genomes to the right lineages. Not long after that, another researcher, postdoc Emily Scher, built a machine-learning algorithm to speed things up even more.
They named the software Pangolin, a tongue-in-cheek reference to a debate about the animal origin of covid. (The whole system is now simply known as Pango.)
The naming system, along with the software to implement it, quickly became a global essential. Although the WHO has recently started using Greek letters for variants that seem especially concerning, like delta, those nicknames are for the public and the media. Delta actually refers to a growing family of variants, which scientists call by their more precise Pango names: B.1.617.2, AY.1, AY.2, and AY.3.
“When alpha emerged in the UK, Pango made it very easy for us to look for those mutations in our genomes to see if we had that lineage in our country too,” says Jolly. “Ever since then, Pango has been used as the baseline for reporting and surveillance of variants in India.”
Because Pango offers a rational, orderly approach to what would otherwise be chaos, it may forever change the way scientists name viral strains—allowing experts from all over the world to work together with a shared vocabulary. Brito says: “Most likely, this will be a format we’ll use for tracking any other new virus.”
Many of the foundational tools for tracking covid genomes have been developed and maintained by early-career scientists like O’Toole and Scher over the last year and a half. As the need for worldwide covid collaboration exploded, scientists rushed to support it with ad hoc infrastructure like Pango. Much of that work fell to tech-savvy young researchers in their 20s and 30s. They used informal networks and tools that were open source—meaning they were free to use, and anyone could volunteer to add tweaks and improvements.
“The people on the cutting edge of new technologies tend to be grad students and postdocs,” says Angie Hinrichs, a bioinformatician at UC Santa Cruz who joined the project earlier this year. For example, O’Toole and Scher work in the lab of Andrew Rambaut, a genomic epidemiologist who posted the first public covid sequences online after receiving them from Chinese scientists. “They just happened to be perfectly placed to provide these tools that became absolutely critical,” Hinrichs says.
It hasn’t been easy. For most of 2020, O’Toole took on the bulk of the responsibility for identifying and naming new lineages by herself. The university was shuttered, but she and another of Rambaut’s PhD students, Verity Hill, got permission to come into the office. Her commute, walking 40 minutes to school from the apartment where she lived alone, gave her some sense of normalcy.
Every few weeks, O’Toole would download the entire covid repository from the GISAID database, which had grown exponentially each time. Then she would hunt around for groups of genomes with mutations that looked similar, or things that looked odd and might have been mislabeled.
When she got particularly stuck, Hill, Rambaut, and other members of the lab would pitch in to discuss the designations. But the grunt work fell on her.
Deciding when descendants of the virus deserve a new family name can be as much art as science. It was a painstaking process, sifting through an unheard-of number of genomes and asking time and again: Is this a new variant of covid or not?
“It was pretty tedious,” she says. “But it was always really humbling. Imagine going through 20,000 sequences from 100 different places in the world. I saw sequences from places I’d never even heard of.”
As time went on, O’Toole struggled to keep up with the volume of new genomes to sort and name.
In June 2020, there were over 57,000 sequences stored in the GISAID database, and O’Toole had sorted them into 39 variants. By November 2020, a month after she was supposed to turn in her thesis, O’Toole took her last solo run through the data. It took her 10 days to go through all the sequences, which by then numbered 200,000. (Although covid has overshadowed her research on other viruses, she’s putting a chapter on Pango in her thesis.)
Fortunately, the Pango software is built to be collaborative, and others have stepped up. An online community—the one that Jolly turned to when she noticed the variant sweeping across India—sprouted and grew. This year, O’Toole’s work has been much more hands-off. New lineages are now designated mostly when epidemiologists around the world contact O’Toole and the rest of the team through Twitter, email, or GitHub— her preferred method.
“Now it’s more reactionary,” says O’Toole. “If a group of researchers somewhere in the world is working on some data and they believe they’ve identified a new lineage, they can put in a request.”
The deluge of data has continued. This past spring, the team held a “pangothon,” a sort of hackathon in which they sorted 800,000 sequences into around 1,200 lineages.
“We gave ourselves three solid days,” says O’Toole. “It took two weeks.”
Since then, the Pango team has recruited a few more volunteers, like UCSC researcher Hindriks and Yale researcher Brito, who both got involved initially by adding their two cents on Twitter and the GitHub page. A postdoc at the University of Cambridge, Chris Ruis, has turned his attention to helping O’Toole clear out the backlog of GitHub requests.
O’Toole recently asked them to formally join the organization as part of the newly created Pango Network Lineage Designation Committee, which discusses and makes decisions about variant names. Another committee, which includes lab leader Rambaut, makes higher-level decisions.
“We’ve got a website, and an email that’s not just my email,” O’Toole says. “It’s become a lot more formalized, and I think that will really help it scale.”
A few cracks around the edges have started to show as the data has grown. As of today, there are nearly 2.5 million covid sequences in GISAID, which the Pango team has split into 1,300 branches. Each branch corresponds to a variant. Of those, eight are ones to watch, according to the WHO.
With so much to process, the software is starting to buckle. Things are getting mislabeled. Many strains look similar, because the virus evolves the most advantageous mutations over and over again.
As a stopgap measure, the team has built new software that uses a different sorting method and can catch things that Pango may miss.
Disability rights advocates are worried about discrimination in AI hiring tools
Making hiring technology accessible means ensuring both that a candidate can use the technology and that the skills it measures don’t unfairly exclude candidates with disabilities, says Alexandra Givens, the CEO of the Center for Democracy and Technology, an organization focused on civil rights in the digital age.
AI-powered hiring tools often fail to include people with disabilities when generating their training data, she says. Such people have long been excluded from the workforce, so algorithms modeled after a company’s previous hires won’t reflect their potential.
Even if the models could account for outliers, the way a disability presents itself varies widely from person to person. Two people with autism, for example, could have very different strengths and challenges.
“As we automate these systems, and employers push to what’s fastest and most efficient, they’re losing the chance for people to actually show their qualifications and their ability to do the job,” Givens says. “And that is a huge loss.”
A hands-off approach
Government regulators are finding it difficult to monitor AI hiring tools. In December 2020, 11 senators wrote a letter to the US Equal Employment Opportunity Commission expressing concerns about the use of hiring technologies after the covid-19 pandemic. The letter inquired about the agency’s authority to investigate whether these tools discriminate, particularly against those with disabilities.
The EEOC responded with a letter in January that was leaked to MIT Technology Review. In the letter, the commission indicated that it cannot investigate AI hiring tools without a specific claim of discrimination. The letter also outlined concerns about the industry’s hesitance to share data and said that variation between different companies’ software would prevent the EEOC from instituting any broad policies.
“I was surprised and disappointed when I saw the response,” says Roland Behm, a lawyer and advocate for people with behavioral health issues. “The whole tenor of that letter seemed to make the EEOC seem like more of a passive bystander rather than an enforcement agency.”
The agency typically starts an investigation once an individual files a claim of discrimination. With AI hiring technology, though, most candidates don’t know why they were rejected for the job. “I believe a reason that we haven’t seen more enforcement action or private litigation in this area is due to the fact that candidates don’t know that they’re being graded or assessed by a computer,” says Keith Sonderling, an EEOC commissioner.
Sonderling says he believes that artificial intelligence will improve the hiring process, and he hopes the agency will issue guidance for employers on how best to implement it. He says he welcomes oversight from Congress.