A Road Map to IoT Research at LISHA
1 - Background
LISHA has been doing research on methodologies, tools and components for Embedded Systems since 1999. As means to demonstrate the effectiveness of our research ideas, we have been building real systems for a variety of domains, from agriculture to digital TV. Wireless communication has always been a key technological aspect in this scenario, eventually turning into a major research topic of its own. From the seminal work on wi-fi and bluetooth back in 2001, we have designed resource-efficient protocols for a huge variety of scenarios. C-MAC, a framework to build custom protocols, was a breakthrough in 2006 that enabled us to quickly move across the flow of technological buzzwords in the field: MANETS, ubiquitous, personal area networks, wireless sensor networks and finally the Internet of Things. By 2009, we came to a point in which we could no longer deal with the pressure to include security in our protocols. Yet, we knew that refocusing our research from system design to security would disrupt our 30-year long research profile. It was our fist real opportunity for something that we now call black-box, cross-domain research. Simply stating, we interacted with security researchers, mainly through the literature, to identify promising algorithms that we could efficiently incorporate into our protocols, components and systems. Since 2013, we are feeling a mounting pressure for advances towards two additional frontiers: integration with ordinary Internet and Big Data Analytics. Unfortunately, after almost two years interacting with the scientific community in these areas, we only found noise and confusion. No clear algorithms or strategies we could clearly follow to advance our IoT scenario. We then opted for putting a huge effort on a European H.2020 project proposal to bring together some of the most renowned researchers in the field to look for new candidates to fulfill the gaps. Unfortunately, the project was rejected and the leftovers are not solid enough for us to advance. Therefore, we now write this manifesto aiming at realigning existing research resources toward a minimally acceptable infrastructure for the IoT.
2 - Envisioned Scenario
1 - TSTP Bootstrapping
IoT devices synchronize position (HECOPS) and time (PTP) and then start a distributed symmetric key generation protocol (Elliptic curve Diffie-Hellman from an UUID and time-stamp within a (narrow) time window) with an IoT gateway.
2 - TSTP Communication
IoT devices use the built-in AES engine to implement the Poly 1305 MAC, which is used to verify the data integrity and the authenticity of messages. Keys are renewed periodically, but never transmitted over the network. Byzantine algorithms are subsequently deployed to confirm authenticity based on cyber-physical properties of the system.
3 - Cloud Storage Bootstrapping
IoT gateways and the Cloud Storage authenticate themselves to each other using a PKI and subsequently get a mapping function (possible random) to distribute data across multitple clouds.
4 - IoT Data Logging
IoT gateways log IoT data while forwarding it across IoT devices. TSTP data is tagged with S.I. semantics, time-stamped and geo-referenced. This data is subsequently recorded using the Cloud Storage public key, partitioned and directly transmitted to one of the clouds using the mapping function.
5 - Cloud Management
Mapping functions and public keys are dynamically managed from the cloud storage, thus optimizing QoS.
6 - Parallel Cloud Processing Engine
A parallel engine recovers partitioned data from multiple clouds, decoding and aggregating them according with application needs and supporting big data analitics.
3 - Deployment Scenario
4 - LISHA's Current Achievements
1. 2 - EPOSMote
2. 3 - TSTP
3. 4 - CoAP Cloud Bridge
5 - Eminent Achievements through the LISHA/LabSEC Partnership
1. 1 - Byzantine CPS security
6 - Major Missing Blocks
1. 1 - Internet Secure Gateway
Our IoT protocol, TSPT, was designed from scratch to support real-time, control and automation applications such as our Solar Smart Building and our Hydrostations. It relies on symmetric encription algorithms (i.e. AES + Poly 1305) and distributed key generation (i.e. Diffie-Helmann) to deliver a scenario in which private keys are never moved across the network and also not previously stored on a high-security environment, two of the most strong requirements we had collect from both industry and academia. Sending encrypted data to the Cloud for subsequent data mining would not make sense, since it would violate the previous strong requirement (private keys of IoT devices would have to be sent to the Cloud processing application). Decrypting IoT data at the gateway is also a bad option, since it would turn the gateway into the Achilles' Heel of the system. And gateways are remotely accessible from the Internet, so they would be subject to the same sort of vulnerabilities. With this in mind, we identified two initially promising alternatives: homomorphic encryption (that enables encrypted data to me operated without being decrypted first) and mixed symmetric/asymmetric encryption (in which a Cloud public key would be involved into our symmetric algorithms as a secondary key, enabling the Cloud to decrypt the data without knowing the IoT device's private key). We are now convinced that neither approach is mature enough to be deployed and the missing steps seem to be far reached.
2. 2 - Big Data Mining
Our IoT protocol indirectly enforces some semantics on data (based on S.I.) and it also causes data to be space-time referenced. Mining time series of data enriched with such metainfo should be easier than mining fully unstructured data. Yet, we so far failed to find a ready-to-use infrastructure that would enable us to automatically derive relationships that we already know to exist in the series from the physical properties of the related cyber-physical systems (e.g. the thermodynamics of our smart building). We have considered Time Series Data Bases with support to statistical analysis tools, such as OpenTSDB and KairosDB, but we missed a set of ready-to-use machine learning algorithms. We subsequently considered cluster computing frameworks such as Apache Spark, and Machine Learning Libraries, such as MLlib, but again the knowledge gap is holding us back. Nevertheless, we deeply believe that "if we can make a machine to learn a pattern and deduce a relationship that we already know to exist for the collected data, then we will be able to make it learn things we do not yet know".