Aggregating IOT data
Data runs the world. Each day, 2.5 million terabytes of data are created by humans (source). The Internet Of Things (IOT) accounts for a huge portion of this statistic. The IOT is the collection of devices with embedded sensors and that collect and exchange data over the internet or similar protocols. IOT accounts for a large portion of the statistic mentioned above.
Making an IOT device is trivial. Using a Raspberry Pi or an Arduino and some sensors, anyone can build a device that checks temperature, humidity, movement… These types of project are cheap and are a great way for people to learn about programming and networking. But the power of IOT doesn’t stop there. IOT devices are used everywhere: boats, smart cars, farms, labs, oil rigs… With all this data being produced, we need to find a way to aggregate and analyze all of this data.
Some solutions exist to aggregate such data, the most notable of which is ThingSpeak. I set out to build a simplified clone of this service to learn about servers and data aggregation. I chose to only keep a subset of features which I found most useful.
My project (which I called ThingTotal) has a single goal: to receive and store data in a way that is easy to analyse and does so in an efficient way. I do so by providing users with “streams”, which are endpoints users can send their data too. “Streams” can be customized to verify the data is in an acceptable format using JSON schema, a tool that allows to annotate and validate JSON data. Each “stream” can have multiple “entries”, which are the individual pieces of data sent to the endpoint.
How I built it
To build this project I used many amazing open source technologies. The server is built using Django. Django provides many features, the most notable of which is its ORM (Object Relational Mapping). Django’s ORM provides a layer of abstraction over the database, allowing us to use python classes and function to interact with the database without having to worry about the queries at a database level. When used properly, ORMs allow for greater confidence in the code and remove many security risks, the most notable of which is SQL injections attacks.
On top of Django, I used the Django Rest Framework (DRF), a framework that helps turn your Django server into a REST API. A REST (Representational State Transfer) API (Application Programming Interface) is, to put it simply, each entry in the database (called ressource) has its own unique URL (ex.
https://example.com/articles/12) and operations on these ressources are done using very specific types of requests. DRF makes it very easy to keep track of what's going on and keep clean urls:
GET <https://example.com/api/streams> -> List all the streams
POST <https://example.com/api/streams> -> Create a new streamGET <https://example.com/api/streams/619b> -> List the stream with id 619b
PUT <https://example.com/api/streams/619b> -> Edit the stream with id 619bGET <https://example.com/api/streams/619b/entries> -> List all the entries for the stream with id 619b
POST <https://example.com/api/stream/619b/entries> -> Create a new entry for the stream with id 619b
An other unique feature of DRF is that it is able to automatically create what’s called an OpenAPI documentation file, which is a standard for documenting APIs. This makes it incredibly easy for teams to work together, where we could have one person working on the server (backend) and someone else working on the web page (frontend) and the designer would be able to see exactly how they should interact with the backend to get a specific ressource or utilize a certain feature.
Finally, for the database (the thing that stores all of the data) I decided to use a NoSQL database, namely MongoDB. MongoDB is a NoSQL database, meaning the data isn’t stored in rows and columns (think excel spreadsheet). MongoDB stores data in “documents” which allows the program to store data as key-value pairs, which is closely linked to the way that the IOT information is received and sent.
Here are the most important pieces of code. I wont go into all details of my code as they get somewhat complex, all of my code is available on GitHub.
Without questions the most important pieces of code are my models. A model is the class that defines how information is stored in the database.
I have 2 models. The first one is the model that stores my streams.
It has many fields. The
_id field is used to uniquely represent each stream. The name and description fields are self-explanatory, they each allow for strings of up to 100 characters to be stored. The
fields property is where users can define the data that they will be sending to the server. This allows the program to validate the data sent. Finally, the
created_at field store the date and time the stream for created.
And I have a model for entries:
This model is a bit more complex. The
created_at fields have the same purpose as in the Stream model. The
stream field is what allows each entry to be linked to one Stream, it basically states that this entry is linked to that streams. It also allows us to then search the database for all entries that are linked to a specific Stream with ease and speed. The
data field is where we store the data sent to the server from IOT devices. We also had a little piece of information, telling the program that when it needs to use the plural of Entry (for example in documentation), it should use "entries", instead of the naive "entrys".
Through this project, I learnt about the power and NoSQL databases when used correctly. I also discovered the true power of many tools I have used in the past, especially MongoDB and Django Rest Framework. This is a (very) simplified clone of ThingSpeak, a great tool that I have used many times in the past. Here’s a video a made about this project.