IoT Data Provenance, Quality, and Security

I’ve long been concerned with the quality of data generated by field sampling and remote monitoring programs. First in environmental investigations — then in healthcare.

Internet-connected devices need to be designed to ensure that the data they provide is sufficiently reliable for its intended use, such as big data analytics.

“Gartner, Inc. forecasts that 6.4 billion connected things will be in use worldwide in 2016, up 30 percent from 2015, and will reach 20.8 billion by 2020. In 2016, 5.5 million new things will get connected every day.” (http://www.gartner.com/newsroom/id/3165317).

Many of these “Internet of Things” (IoT) devices contain sensors that will collectively generate petabytes of data and there has been much discussion of so-called “Big Data” analytics for managing and making sense of this data. Most of these big data discussions seem to assume that the data is good. The phrase “garbage in/garbage out” is fairly well known, but there does not seem to be much discussion of its importance for data analytics. Decisions based on poor data may yield the wrong results.

Key considerations to make sure that good data is used include the provenance of the data and its quality. That is, reliable knowledge about the source of the data (provenance) and an appropriate level of certainty about the timeliness and correctness of the data (quality). In this context, “correctness” means that the data is sufficiently accurate, precise, and specific. Correctness also means that the data is appropriate for its intended use.

Data users depend on an appropriate level of security for the data. This includes data at rest (in storage), data in motion (being transmitted), and data in use (being analyzed). Security is not just about confidentiality, but includes integrity (not corrupted or deleted), availability (able to be accessed when desired), and non-repudiation (once established, cannot be revoked or denied). Remote patient monitoring devices and environmental sensors are examples where high assurance regarding provenance, quality, and security can be important.

Until recently, more attention has been given to confidentiality than to integrity and availability. Thousands of articles have been written about preventing hackers from stealing information. Additionally, consciousness about security for Internet-connected devices has been raised due to the role of Internet-connected devices in recent Distributed Denial of Service (DDoS) attacks on web sites and Domain Names Services (DNS). While these are important issues to address, going forward, industry and government need to pay more attention to also ensuring these Internet-connected devices are producing high quality data and that the source of the data is adequately identified, so that consumers of such data can have confidence in what they are receiving.

Besides Internet-connected devices collecting and reporting data, other connected devices are used for executing actions, such as opening and closing valves. Some of these devices operate autonomously or semi-autonomously using data delivered via local, wide-area, or cloud networks to trigger actions. For example, local or remote sensors that trigger controls to shut down an overheating machine or shut it down when other conditions make it necessary.

Similar to remote data sensing devices, remotely-triggered devices need to have confidence regarding data provenance, quality, and security. In other words, confidence that information or commands are coming from an authorized and trusted source, that the information is correct, and that the system has not been compromised.

During the design, development, and implementation of IoT devices and associated structures, a systematic approach is needed to ensure that data provenance, quality, and security are adequately addressed. It is not sufficient to conduct security reviews after products have already been developed. Strategies and tactics are needed to (i) establish appropriate levels for provenance, quality, and security, (ii) ensure they are implemented and maintained, and then (iii) monitor compliance. Ideally, these will be expressed in a set of best practices for the design, manufacture, and implementation of Internet-connected devices.

Additional notes:

Blockchain technology, which is the underlying technology supporting Bitcoin and other cryptocurrencies, is starting to be used to address the issues of Provenance. Distributed ledgers built with blockchain technology provide a trusted, immutable, time-stamped record of transactions or assets.

Here are just a few examples:

  • “Chronicled leverages blockchain and smart labels to create an open system of authenticity, ownership, provenance & connectivity for assets” and “We sign sensor data and log to blockchain to secure data provenance.” See http://chronicled.com
  • “Health Linkages is the Data Provenance Company. We use blockchain and big data technologies to enable healthcare institutions to trust, protect, and share their data.” See http://healthlinkages.com
  • “Maureen Downey and Everledger have joined forces to launch the Chai Wine Vault, an unprecedented solution for securing the authenticity and provenance of fine wine. See https://www.winefraud.com/chai-wine-vault/. Also, “Everledger provides an immutable ledger for diamond ownership and related transaction history verification for insurance companies, owners, claimants, and law enforcement agencies. It was founded on April 10, 2015, and is based in London, United Kingdom.” See https://www.everledger.io
  • “Catenis Enterprise thwarts hacking attacks by always ensuring that every single communication sent to and from all devices uses cryptographic signature verification. This ensures that devices only accept commands and signals that are verified by military grade cryptography. Creating peace of mind for your security team and your company.” See http://blockchainofthings.com/downloads/CatenisDataSheet.pdf
  • “The blockchain is becoming the new standard for trust and verification of data. Tierion turns the blockchain into a global platform for verifying any data, file, or process. Use Tierion’s API and tools to anchor a permanent, timestamped proof of your data in the blockchain.” See https://tierion.com

* * *

See https://42tek.com/meetingsreferences-html/ for links to other presentations regarding blockchain technology.

Contact [email protected] to participate in discussions of best practices for IoT data provenance, quality, and security.