FB pixel

AI firm with ties to U.S. government exposes of billions of documents in breach

UpGuard finds PII, biometrics among 550GB of data exposed due to configuration error
AI firm with ties to U.S. government exposes of billions of documents in breach
 

New research from data security firm UpGuard shows that a U.S. government AI contractor’s massive database of sensitive documents was exposed on the Internet until the end of last month. In a post on its blog, UpGuard breaks down how Veritone AI exposed 550GB of internal and client data including audio, video and biometric image media, employee PII, police body camera footage, FOIA requests and related documents, employee credentials, system logs with authorization tokens, and more.

The exposed centralized dataset contained sensitive information about Veritone resources and users, including employees’ full names, usernames, and email addresses. But exposure of government personnel data was of particular concern. “Internal credentials also appear in the exposed logs, such as application tokens and, in some cases, plain text passwords. The unauthorized use of these credentials would grant a threat actor whatever level of access the exposed accounts held, possibly exposing additional sensitive data to a malicious third party.”

At least some of the exposed personal data was being used to train AI systems, which has some observers asking if machine learning algorithms touting their security bonafides are in fact creating a mother lode of vulnerable data honeypots.

“What we have become accustomed to call ‘artificial intelligence’ relies on concatenating pieces of an enormous dataset with a complex algorithm and detailed data tagging” says UpGuard. “Because AI technologies often require massive databases full of whatever information they are analyzing, both the likelihood and impact of a data exposure rapidly increase.” It notes that “a significant portion of the services Veritone provides for government and police agencies involves automatically redacting sensitive information from documents, analyzing facial recognition data (referred to as identifying suspects), and processing audio and video surveillance data to find insights, keywords, and types of images.” It also points out that Veritone provides AI services for a wide array of industries, including law, energy, and entertainment – meaning the potential for data breaches is everywhere.

UpGuard discovered Veritone’s first exposed Elasticsearch server hosted on the Microsoft Azure Government Cloud on March 23. It contained 464 million documents. The next day, the second server was discovered, containing 1.2 billion documents. According to the blog, “these servers did not require or ask for any credentials but rather provided anonymous access to anyone on the internet.”

After being made aware of the breach, Veritone secured the Elastic servers on March 30. The data is no longer publicly available.

In this case, the fault does not lie with Elasticsearch. The software, an open source search and analytics engine designed to quickly search large datasets, can be configured to require authentication. However, Veritone’s servers were not configured as such – an oversight that undercut other security measures and left the government data exposed. Elasticsearch has been transparent about the necessity of configuring the software for authentication. A blog from 2020 outlines simple steps users can take to secure their data from breaches.

In an interview with Axios, UpGuard VP of Cyber Research Greg Pollock says Microsoft is likely also off the hook. “Microsoft is providing the government cloud as a service; they’re probably not involved in the administration of this database,” Pollock says.

If the responsibility lies with Veritone in its failure to properly configure the Elasticsearch servers – as UpGuard’s assessment clearly implies in stating that “operational tasks such as spinning up an Elastic server should have controls in place to ensure that the server is not publicly accessible” – it is not the first AI firm to mishandle data. Still, given the volume and sensitivity of Veritone’s information, the breach could have significant implications for how AI training databases are collected, stored and secured.

Related Posts

Article Topics

 |   |   |   |   | 

Latest Biometrics News

 

Worldcoin open-sources multi-party computation system

Worldcoin is introducing a process for processing biometrics and other data that it believes is secure enough for other companies…

 

Palm-scanning Humanity Protocol emerges as new kid on the blockchain

Worldcoin has thus far been the most high profile project pursuing biometric digital identity on a global scale, but its…

 

Zimbabwe implements biometric border posts for automating immigration processes

Zimbabwe is implementing fingerprint and iris biometrics at its borders to automate tasks that have historically been carried out by…

 

African nations and fintechs urged to boost digital ID, DPI for economic gain

Representatives of Africa’s fintech sector heard about the importance of digital identity, and digital public infrastructure more generally, to the…

 

Taiwan’s digital ID card initiative suspension adds NT$202M in costs

Reports indicate that the Ministry of the Interior (MOI) in Taiwan is set to face a significant financial burden. The…

 

Belgium launches national digital identity wallet

Belgium has become one of the first countries to launch a digital identity wallet after the publication of the European…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events