When the device is turned on it checks if it is registered in a home. If not – it becomes visible over Bluetooth for our mobile app. I will write more about tech details for the Bluetooth communication in other posts.
It tries to connect to the backend once put in the electrical socket. The devices can be passive and active. More about it here – www.kinetid.com and in my previous post about the project.
In the passive devices there is no ESP module (for communication over WiFi). The goal is to keep the price low. In the devices we have added PLC module. It is used by the passive devices to send signals to the active ones over the power line. What that means is that in a home you need to have at least one active device. It uses the WiFi to talk to the backend..
Each device has an unique identifier which we use to recognise it.
How we talk to the backend? We’ve programmed the device to report current consumption each minute. There is an electric meter in every device. What’s more the backend sends commands to switch on and off the electrical devices put in the device’s socket.
In order to make that happen we use Web Socket. Of course there were a lot of issues while we were programming the ESP module. And this post is about how we solved them.
The WebSocket, uses the HTTP protocol for the initial handshake. This is what the Chrome inspectors says while a WebSocket clients tries to connect.
GET ws://localhost:9000/socket HTTP/1.1 Host: localhost:9000 Connection: Upgrade Pragma: no-cache Cache-Control: no-cache Upgrade: websocket Origin: http://localhost:8080 Sec-WebSocket-Version: 13 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 Accept-Encoding: gzip, deflate, br Accept-Language: en-GB,en-US;q=0.9,en;q=0.8,bg;q=0.7,ru;q=0.6 Cookie: _ga=GA1.1.1230152116.1522842268 Sec-WebSocket-Key: pCqArSp9LxSw7BHQSj9i/g== Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Our backend is waiting for HTTP request, because it is a web server. The web site is there as well as the API for the mobile apps (REST over HTTP with JSON).
In order to open the TCP connection firstly we need the Sec-WebSocket-Key string. With it together with the string with a special GUID
258EAFA5-E914-47DA-95CA-C5AB0DC85B11
the server is calculating the SHA-1 hash. In response the web server sends back the calculated hash. From now on the socket stays open and is being used for two direction communication.
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: YfeGvxJVa7iJA3g5rdHWaWZFsHA=
We’ve got the backend but we have to write the code that behaves as a client. It will be installed in the ESP module and we will use it for communicating with the server.
We’ve prepared a simple WebSocket client, that is executed in the browser. It will help us while we are implementing all the spec of the WebSocket.
<html> <head> <script src="jquery.min.js"></script> </head> <body> <script> var W = { s : null, id : "39628657580236", init : function() { W.s = new WebSocket("ws://localhost:9000/socket"); W.s.onmessage = W.onMessage; W.s.onopen = W.onOpen; $("#btn").click(W.addConsumptions); }, onMessage : function(event) { console.log(event.data); }, send : function(s) { console.log("Will send", s); W.s.send(JSON.stringify(s)); }, onOpen : function() { console.log("Connected and sending hello") W.send({ "cmd" : "hello", "deviceId" : W.id }) W.setHb() }, setHb : function() { if (W.hbHandler) { clearInterval(W.hbHandler) } W.hbHandler = setInterval(function() { W.send({ "cmd" : "hb", "deviceId" : W.id }) }, 4000) }, addConsumptions : function() { W.send({ "cmd" : "report", "deviceId" : W.id, "consumptions" : [ { "interval" : 60, "consumption" : Math.floor(Math.random() * 100) + 1 } ] }) } }; $(document).ready(function() { W.init() }); </script> <button id="btn">Add consumptions</button> </body> </html>
This, of course, is not enough. We had to create a simple TCP server to be sure that we properly implement the RFC, describing the protocol. What’s more, when the handshake is complete the protocol is switched – it wasn’t only the message sent over TCP but frames that wrap your message! Here it is:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
We’ve started a simple Java ServerSocket. We will use it to simulate the web server so that we would be able to track every packet the browser client is sending.
public void server() throws Exception{ Socket client = new ServerSocket(9000).accept(); InputStream stream = client.getInputStream(); StringBuilder b = new StringBuilder(); boolean http = true; while (true) { byte [] bytes = new byte[10000]; int r = stream.read(bytes); if (r==-1)break; if (!http) { System.out.println(bytes2Hex(bytes)); System.out.println(parseMessage(bytes)); }else { System.out.print(new String(bytes, "utf8")); b.append(new String(bytes, "utf8")); client.getOutputStream().write((handshakeResponse+getResponse(getKey(b.toString()))+"\r\n\r\n").getBytes()); http=false; } } }
So what’s going on here? After the client connects we start we start reading what it is sending. Because the first commands are HTTP protocol and lets say that 10K bytes would be enough for the message, we can be sure that the whole message would arrive with the first read. That is why we set the flag http to false because the next read will be with WebSocket frames.
The browser sends the request from the figure from above. Our server should respond properly. The response is in the handshakeResponse var.
String handshakeResponse = "HTTP/1.1 101 Switching Protocols\n" + "Upgrade: websocket\n" + "Connection: Upgrade\n" + "Sec-WebSocket-Accept: ";
We append the SHA-1 hash, which we calculated from the string the browser sent at the beginning together with the special string GUID..
private String getResponse(String key)throws Exception { MessageDigest crypt = MessageDigest.getInstance("SHA-1"); crypt.reset(); crypt.update((key+"258EAFA5-E914-47DA-95CA-C5AB0DC85B11").getBytes("utf8")); return Base64.getEncoder().encodeToString(crypt.digest()); }
We send back a response to the browser and if it is ok, the socket is open. Then in JavaScript the onConnect function on line 24 from the client above is called.
onOpen : function() { console.log("Connected and sending hello") W.send({ "cmd" : "hello", "deviceId" : W.id }) W.setHb() },
After connecting the client is sending the command hello. The protocol is proprietary but still sent over the frames.
What are these frames?
As you can see from the figure from above the higher lever protocol should be wrapped in another protocol, which is sent over the socket.
In subsuquent reads (after we set the flag http to false) our Java server first has to shows us what bytes are coming and the to try to parse them. Here are the bytes coming from Chrome. The string is and the bytes are:
{"cmd":"hello","deviceId":"39628657580236"}
81 AB A4 1B 7B 95 DF 39 18 F8 C0 39 41 B7 CC 7E 17 F9 CB 39 57 B7 C0 7E 0D FC C7 7E 32 F1 86 21 59 A6 9D 2D 49 AD 92 2E 4C A0 9C 2B 49 A6 92 39 06 00 00 00 00 00
The row will continue with zeros till the 10kth byte.
What do these bytes mean? Lets see how the frame looks like.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
The first byte in the response is 0x81. In bits it is 1000 0001. According to the figure from above and the spec, that means that the message is not a part from another message (the first bit) and it is a text message (the last bit).
The second byte says how long the message is and if it is masked with the masing-key. According to the spec and the second byte, which is 0хAB or 1010 1011 we know that we have XOR encoding of the data because of the first bit (the left one). Than we have 010 1011, which is 43 symbols. Well, in fact the length of
{"cmd":"hello","deviceId":"39628657580236"}
is 43 symbols exactly.
In seven bits we can put not bigger number than 127 and of course there could be longer frames. That is why the spec says that if the number is 126, then we use the next two bytes for length. If it is 127, we use the next 4 bytes for length.
int size = ((byte)bytes[1])&0b0111_1111; ofs =2; if (size==126) { size = bytes[2]<<8+bytes[3]; ofs +=2; } if (size==127) { size = bytes[2]<<32+bytes[3]<<16+bytes[4]<<8+bytes[5]; ofs +=4; }
Our message is less than 127 so we continue with the parsing.
The next 4 bytes are the mask. It is used to XOR the message till the end.
byte [] maskingKey = new byte[4]; System.arraycopy(bytes, ofs, maskingKey, 0, 4); ofs+=4; byte [] decoded = new byte[size]; System.out.println("Size is "+size); for (int i =0;i<size;i++) { byte b = maskingKey[i%4]; byte vb = bytes[i+ofs]; decoded[i]= (byte)(vb^b); }
This is thw whole method we used for parsing the message. At the end we have a ready parsed string.
private String parseMessage(byte []bytes) throws Exception { int ofs =1; int size = ((byte)bytes[1])&0b0111_1111; ofs =2; if (size==126) { size = bytes[2]<<8+bytes[3]; ofs +=2; } if (size==127) { size = bytes[2]<<32+bytes[3]<<16+bytes[4]<<8+bytes[5]; ofs +=4; } byte [] maskingKey = new byte[4]; System.arraycopy(bytes, ofs, maskingKey, 0, 4); ofs+=4; byte [] decoded = new byte[size]; System.out.println("Size is "+size); for (int i =0;i<size;i++) { byte b = maskingKey[i%4]; byte vb = bytes[i+ofs]; decoded[i]= (byte)(vb^b); } return new String(decoded); }
So we already know how to create the message, what bytes are necessary, etc. Thus we can code the algorithm for the ESP. With the help of the simple Java server we had the change to fully test how messages are created.
And this is the whole server:
package com.infinno.websocket; import java.io.InputStream; import java.net.ServerSocket; import java.net.Socket; import java.security.MessageDigest; import java.util.Base64; public class SimpleWebsocket { public static void main(String ...args) throws Exception { new SimpleWebsocket().server(); } private String getKey(String request) { String key= request.split("Sec-WebSocket-Key: ")[1].split("\r\n")[0]; return key; } private String bytes2Hex(byte[] bytes) { StringBuilder sb = new StringBuilder(); for (byte b : bytes) { sb.append(String.format("%02X ", b)); } return sb.toString(); } public void server() throws Exception{ Socket client = new ServerSocket(9000).accept(); InputStream stream = client.getInputStream(); StringBuilder b = new StringBuilder(); boolean http = true; while (true) { byte [] bytes = new byte[10000]; int r = stream.read(bytes); if (r==-1)break; if (!http) { System.out.println(bytes2Hex(bytes)); System.out.println(parseMessage(bytes)); }else { System.out.print(new String(bytes, "utf8")); b.append(new String(bytes, "utf8")); client.getOutputStream().write((handshakeResponse+getResponse(getKey(b.toString()))+"\r\n\r\n").getBytes()); http=false; } } } private String parseMessage(byte []bytes) throws Exception { int ofs =1; int size = ((byte)bytes[1])&0b0111_1111; ofs =2; if (size==126) { size = bytes[2]<<8+bytes[3]; ofs +=2; } if (size==127) { size = bytes[2]<<32+bytes[3]<<16+bytes[4]<<8+bytes[5]; ofs +=4; } byte [] maskingKey = new byte[4]; System.arraycopy(bytes, ofs, maskingKey, 0, 4); ofs+=4; byte [] decoded = new byte[size]; System.out.println("Size is "+size); for (int i =0;i<size;i++) { byte b = maskingKey[i%4]; byte vb = bytes[i+ofs]; decoded[i]= (byte)(vb^b); } return new String(decoded); } private String getResponse(String key)throws Exception { MessageDigest crypt = MessageDigest.getInstance("SHA-1"); crypt.reset(); crypt.update((key+"258EAFA5-E914-47DA-95CA-C5AB0DC85B11").getBytes("utf8")); return Base64.getEncoder().encodeToString(crypt.digest()); } String handshakeResponse = "HTTP/1.1 101 Switching Protocols\n" + "Upgrade: websocket\n" + "Connection: Upgrade\n" + "Sec-WebSocket-Accept: "; }
Java developers work closely with their embedded colleagues. This is how we do it in Infinno. We share knowledge so that the projects would be successfully completed.