The company blog post drips with the enthusiasm of a ’90s US infomercial. WellSaid Labs describes what clients can expect from its “eight new digital voice actors!” Tobin is “energetic and insightful.” Paige is “poised and expressive.” Ava is “polished, self-assured, and professional.”
Each one is based on a real voice actor, whose likeness (with consent) has been preserved using AI. Companies can now license these voices to say whatever they need. They simply feed some text into the voice engine, and out will spool a crisp audio clip of a natural-sounding performance.
WellSaid Labs, a Seattle-based startup that spun out of the research nonprofit Allen Institute of Artificial Intelligence, is the latest firm offering AI voices to clients. For now, it specializes in voices for corporate e-learning videos. Other startups make voices for digital assistants, call center operators, and even video-game characters.
Not too long ago, such deepfake voices had something of a lousy reputation for their use in scam calls and internet trickery. But their improving quality has since piqued the interest of a growing number of companies. Recent breakthroughs in deep learning have made it possible to replicate many of the subtleties of human speech. These voices pause and breathe in all the right places. They can change their style or emotion. You can spot the trick if they speak for too long, but in short audio clips, some have become indistinguishable from humans.
AI voices are also cheap, scalable, and easy to work with. Unlike a recording of a human voice actor, synthetic voices can also update their script in real time, opening up new opportunities to personalize advertising.
But the rise of hyperrealistic fake voices isn’t consequence-free. Human voice actors, in particular, have been left to wonder what this means for their livelihoods.
How to fake a voice
Synthetic voices have been around for a while. But the old ones, including the voices of the original Siri and Alexa, simply glued together words and sounds to achieve a clunky, robotic effect. Getting them to sound any more natural was a laborious manual task.
Deep learning changed that. Voice developers no longer needed to dictate the exact pacing, pronunciation, or intonation of the generated speech. Instead, they could feed a few hours of audio into an algorithm and have the algorithm learn those patterns on its own.
“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s.”
Rupal Patel, founder and CEO of VocaliD
Over the years, researchers have used this basic idea to build voice engines that are more and more sophisticated. The one WellSaid Labs constructed, for example, uses two primary deep-learning models. The first predicts, from a passage of text, the broad strokes of what a speaker will sound like—including accent, pitch, and timbre. The second fills in the details, including breaths and the way the voice resonates in its environment.
Making a convincing synthetic voice takes more than just pressing a button, however. Part of what makes a human voice so human is its inconsistency, expressiveness, and ability to deliver the same lines in completely different styles, depending on the context.
Capturing these nuances involves finding the right voice actors to supply the appropriate training data and fine-tune the deep-learning models. WellSaid says the process requires at least an hour or two of audio and a few weeks of labor to develop a realistic-sounding synthetic replica.
AI voices have grown particularly popular among brands looking to maintain a consistent sound in millions of interactions with customers. With the ubiquity of smart speakers today, and the rise of automated customer service agents as well as digital assistants embedded in cars and smart devices, brands may need to produce upwards of a hundred hours of audio a month. But they also no longer want to use the generic voices offered by traditional text-to-speech technology—a trend that accelerated during the pandemic as more and more customers skipped in-store interactions to engage with companies virtually.
“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s,” says Rupal Patel, a professor at Northeastern University and the founder and CEO of VocaliD, which promises to build custom voices that match a company’s brand identity. “These brands have thought about their colors. They’ve thought about their fonts. Now they’ve got to start thinking about the way their voice sounds as well.”
Whereas companies used to have to hire different voice actors for different markets—the Northeast versus Southern US, or France versus Mexico—some voice AI firms can manipulate the accent or switch the language of a single voice in different ways. This opens up the possibility of adapting ads on streaming platforms depending on who is listening, changing not just the characteristics of the voice but also the words being spoken. A beer ad could tell a listener to stop by a different pub depending on whether it’s playing in New York or Toronto, for example. Resemble.ai, which designs voices for ads and smart assistants, says it’s already working with clients to launch such personalized audio ads on Spotify and Pandora.
The gaming and entertainment industries are also seeing the benefits. Sonantic, a firm that specializes in emotive voices that can laugh and cry or whisper and shout, works with video-game makers and animation studios to supply the voice-overs for their characters. Many of its clients use the synthesized voices only in pre-production and switch to real voice actors for the final production. But Sonantic says a few have started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai and others have also worked with film and TV shows to patch up actors’ performances when words get garbled or mispronounced.
2021 has broken the record for zero-day hacking attacks
“Part of the reason you’re seeing more now is because we’re finding more,” says Microsoft’s Doerr. “We’re better at shining a spotlight. Now you can learn from what’s happening at all your customers, which helps you get smarter faster. In the bad situation where you see something new, that will impact one customer instead of 10,000.”
The reality is a lot messier than the theory, however. Earlier this year, multiple hacking groups launched offensives against Microsoft Exchange email servers. What started as a critical zero-day attack briefly became even worse in the period after a fix became available but before it was actually applied to users. That gap is a sweet spot hackers love to hit.
As a rule, however, Doerr is spot on.
Exploits are getting harder—and more valuable
Even if zero-days are being seen more than ever, there is one fact that all the experts agree on: they are getting harder and more expensive to pull off.
Better defenses and more complicated systems mean hackers have to do more work to break into a target than they did a decade ago—attacks are costlier and require more resources. The payoff, however, is that with so many companies operating in the cloud, a vulnerability can open millions of customers up to attack.
“Ten years ago, when everything was on premises, a lot of the attacks only one company would see,” says Doerr, “and few companies were equipped to understand what was going on.”
Faced with improving defenses, hackers often must link together multiple exploits instead of using just one. These “exploit chains” require more zero-days. Success at spotting these chains is also part of the reason for the steep rise in numbers.
Today, says Dowd, attackers are “having to invest more and risk more by having these chains to achieve their goals.”
One important signal comes from the rising cost of the most valuable exploits. The limited data available, such as Zerodium’s public zero-day prices, shows as much as a 1,150% rise in the cost of the highest-end hacks over the last three years.
But even if zero-day attacks are harder, the demand has risen, and supply follows. The sky might not be falling—but neither is it a perfectly sunny day.
How these US schools reopened without sparking a covid outbreak
“Cleaning high-touch areas is very important in schools,” Cogan said. But mask-wearing, physical distancing, vaccinations, and other measures are “higher protective factors.”
8. Give agency to parents and teachers in protecting their kids.
Last school year, many districts used temperature checks and symptom screenings as an attempt to catch infected students before they gave the coronavirus to others. But in Austin, Indiana, such formalized screenings proved less useful than teachers’ and parents’ intuition. Instructors could identify when a student wasn’t feeling well and ask them to go see the nurse, even if that student passed a temperature check.
Jetelina said that teachers and parents can both act as a layer of protection, stopping a sick child from entering the classroom. “Parents are pretty good at understanding the symptoms of their kids and the health of their kids,” she said.
In Andrews, Texas, district administrators provided parents with information on covid symptoms and entrusted those parents to determine when a child may need to stay home from school. The Texas district may have “gone way overboard with giving parents agency,” though, Cogan said, in allowing students to opt out of quarantines and mask-wearing—echoing concerns from the Andrews County public health department.
9. We need more granular data to drive school policies.
Throughout the pandemic, I’ve consistently called out a lack of detailed public data on covid-19 cases in schools. The federal government still does not provide such data, and most states offer scattered numbers that don’t provide crucial context for cases (such as in-person enrollment or testing figures). Without these numbers, it is difficult to compare school districts and identify success stories.
My research on school reopenings illuminated another data issue: most states are not providing any covid-19 metrics down to the individual district, making it hard for school leaders to know when they must tighten down on or loosen safety protocols. At the tiny Port Orford–Langlois district in Oregon, for example, administrators had to rely on covid-19 numbers for their overall county. Even though the district had zero cases in fall 2020, it wasn’t able to bring older students back in person until the spring because outbreaks in another part of the county drove up case numbers. Cogan has observed similar issues in New Jersey.
At a local level, school districts may work with their local public health departments to get the data they need for more informed decision-making, Jetelina said. But at a larger, systemic level, getting granular covid-19 data is more difficult—a job for the federal government.
10. Invest in school staff and invite their contributions to safety strategies.
School staff described working long hours, familiarizing themselves with the science of covid-19, and exercising immense determination and creativity to provide their students with a decent school experience. Teaching is typically a challenging job, but in the last 18 months, it has become heroic—even though many people outside school environments take this work for granted, Jetelina said.
Districts can thank their staff by giving them a say in school safety decisions, Cogan recommended. “Educators—they’ve had a God-awful time and had a lot more put on them,” she said. But “every single person that works in a school has as well.” That includes custodians, cafeteria workers, and—crucially—school nurses, who Cogan calls the “chief wellness officers” of the school.
11. Allow students and staff the space to process pandemic hardship.
About 117,000 children in the US have lost one or both parents during the pandemic, according to research from Imperial College London. Thousands more have lost other relatives, mentors, and friends—while millions of children have faced job loss in their families, food and housing insecurity, and other hardships. Even if a school district has all the right safety logistics, school staff cannot truly support students unless they allow time and space to process the trauma that they’ve faced.
P.S. 705 in Brooklyn may serve as a model for this practice. School staff preemptively reached out to families when a student missed class, offering support: “705 is just the kind of place where it is a ‘wrap your arms around the whole family’ kind of a school,” one parent said.
On the first day of school in September 2021—when many students returned in person for the first time since spring 2020—the school held a moment of silence for loved ones that the school community has lost.
New challenges ahead
These lessons are drawn from school communities that were successful in the 2020-2021 school year, before the delta variant hit the US. This highly transmissible strain of the virus poses new challenges for the fall 2021 semester. The data analysis underlying this project led me to profile primarily rural communities, which may have gotten lucky with low covid-19 case numbers in previous phases of the pandemic—but are now unable to escape delta. For example, the Oregon county including Port Orford–Langlois saw its highest case rates yet in August 2021.
The delta challenge is multiplied by increasing polarization over masks, vaccines, and other safety measures. Still, Jetelina pointed out that there are also “a ton of champions out there,” referring to parents, teachers, public health experts, and others who continue to learn from past school reopening experiences—and advocate for their communities to do a better job.
The Solutions Journalism Network supported this project with a reporting grant, as well as trainings and other guidance. Learn more about the five school communities I profiled in this project for the COVID-19 Data Dispatch.
This story is part of the Pandemic Technology Project, supported by The Rockefeller Foundation.
US unfairly targeting Chinese over industrial spying, says report
For years, civil rights groups have accused the US Department of Justice of racial profiling against scientists of Chinese descent. Today, a new report provides data that may quantify some of their claims.
The study, published by the Committee of 100, an association of prominent Chinese-American civic leaders, found that individuals of Chinese heritage were more likely than others to be charged under the Economic Espionage Act—and significantly less likely to be convicted.
“The basic question that this study tries to answer is whether Asian-Americans are treated differently with respect to suspicions of espionage,” said the report’s author, Andrew C. Kim, a lawyer and visiting scholar at the South Texas College of Law Houston. “The answer to that question is yes. “
The study, which looked at data from economic espionage cases brought by the US from 1996 to 2020, found that just under half of all defendants were accused of stealing secrets that would benefit China. This is far lower than the figures laid out by US officials to justify the Department of Justice’s flagship China Initiative.
According to the report, 46% of defendants charged under the Economic Espionage Act were accused of activity that would benefit Chinese people or entities, while 42% of defendants were accused of stealing secrets that would benefit American businesses.
The numbers directly contradict much of the Justice Department’s messaging around the China Initiative, which was launched in 2018 to combat economic espionage. The department has stated publicly—for example, in the first line of its home page for the China Initiative—that 80% of its prosecutions would benefit the Chinese state, reflecting “theft on a scale so massive that it represents one of the largest transfers of wealth in human history,” as FBI director Christopher Wray described it in 2020.
Since 2019, the program has largely targeted academic researchers.
“Strong evidence of charges with less evidence”
The report was based on an analysis of public court filings, as well as Department of Justice press releases, for all Economic Espionage Act prosecutions between 1996 and 2020. It’s an update of an earlier analysis, published in the Cardozo Law Review, which covered the period up to 2016.
Charges for “theft of trade secrets” and “economic espionage” were both included, with the “economic espionage” charge requiring proof of a “nexus to foreign entity” and accompanied by higher penalties. (These two categories make up only a portion of the charges under the China Initiative; Kim briefly mentions “false statements and process crimes,” and people have also been charged with grant fraud and lying on visa applications, among other crimes.)
Because demographic information and citizenship data is not included in court filings, Kim used names as proxies for race, and he used Google searches when names, like Lee and Park, were ethnically ambiguous. For citizenship, Kim noted that press releases often make prominent mention if a defendant is a “foreign national,” so he assumed that defendants were all citizens unless otherwise indicated.