Start reading through text to fix errors

2018-12-03 00:24:06 +01:00
parent 137fde57ca
commit 2206844a06
7 changed files with 209 additions and 130 deletions
--- a/content/architecture.tex
+++ b/content/architecture.tex
@ -6,29 +6,35 @@ two Docker images:
 \begin{itemize}
    \item the \code{solvable} image is responsible for running the framework and the client
          code depending on it
-    \item the \code{controller} image is responsible for solution checking (to figure out
-          whether the user completed the tutorial or not)
+    \item the \code{controller} image is responsible for solution checking: to figure out
+          whether the user has successfully completed the tutorial or not
 \end{itemize}
-During most of this capter I am going to be discussing the \code{solvable} Docker image,
+During most of this chapter I am going to be discussing the \code{solvable} Docker image,
 with the exception of Section~\ref{solutioncheck}, where I will dive into how the
 \code{controller} image is implemented.

 The most important feature of the framework is it's messaging system.
 Basically what we need is a system where processes running inside a Docker container
 would be allowed to communicate with eachother. 
-This is easy with lots of possible solutions (named pipes, sockets or shared memory to name a few).
-The hard part is that frontend components running inside a web browser --- which could be
-potentially on the other side of the planet --- would also need to partake in said communication.
+This task is very easy to solve, with lots of possible solutions
+(named pipes, sockets or shared memory to name a few).
+The hard part is that frontend components running inside a web browser --- which could
+potentially be located on the other side of the planet%
+\footnote{Potentially introducing all sorts of issues regarding latency} --- would
+also need to partake in said communication.
 So what we need to create is something of a hybrid between an IPC system and something
 that can communicate with JavaScript running in a browser connected to it.
 The solution the framework uses is a proxy server, which connects to frontend components
 on one side and handles interprocess communication on the other side. 
 This way the server is capable of proxying messages between the two sides, enabling
 communitaion between them.
-Notice that this way what we have is essentially an IPC system in which a web application
+Notice that this way what we have is essentially an IPC%
+\footnote{Interprocess communication} system in which a web application
 can ``act like'' it was running on the backend in a sense: it is easily able to
-communicate with processes on the backend, while in reality the web application
-runs in the browser of the user, on a completely different machine.
+communicate with processes running there, while in reality the web application
+is running in the browser of the user, on a completely different machine and it uses
+some means of communication that is routed through the public internet to achieve this
+effect.

 \begin{note}
 The core idea and initial implementation of this server comes from Bálint Bokros,
@ -38,54 +44,65 @@ message authentication, restoration of frontend state, a complete overhaul of th
 state tracking system and the possibility for solution checking among other things).
 If you are explicitly interested in the differences between the original POC implementation
 (which is out of scope for this thesis due to lenght constraints) and the current
-framework please consult Bálint's excellent paper and Bachelor's Thesis on it\cite{BokaThesis}.
+framework please consult Bálint's excellent paper and Bachelor's thesis on it\cite{BokaThesis}.
 \end{note}

-Now let us take a closer look:
+Now let us take a closer look at the technology used to implement such a server and
+some of the design decisions behind this:

 \subsection{Connecting to the Frontend}

-The old way of creating dynamic webpages was AJAX polling, which is basically sending
+The old way of creating dynamic webpages was AJAX%
+\footnote{AJAX stands for Asynchronous JavaScript And XML, despite usually not having
+anything to do with XML in practice}
+polling, which is basically sending
 HTTP requests to a server at regular intervals from JavaScript to update the contents
 of your website (and as such requiring to go over the whole TCP handshake and the
 HTTP request-response on each update).
 This has been superseded by WebSockets around 2011, which provide a full-duplex
 communication channel over TCP between your browser and the server.
-This is done by initiation a protocol handshake using the \code{Connection: Upgrade}
+This is done by initiating a protocol handshake using the \code{Connection: Upgrade}
 HTTP header, which establishes a premanent socket connection between the browser
 and the server.
 This allows for communication with lower overhead and latency facilitating efficient
-real-time applications.
+real-time applications, which were not always possible to create before due to
+the overheads%
+\footnote{In some applications this overhead could be bigger than the actual data sent,
+such as singaling} introduced by AJAX polling.

 The Tutorial Framework uses WebSockets to connect to it's web frontend.
-The framework proxy server is capable to connecting to an arbirary number of websockets,
-which allows opening different components in separate browser windows and tabs, or even
-in different browsers at once (such as opening a terminal in Chrome and an IDE in Firefox).
+The TFW proxy server is capable to connecting to an arbirary number of WebSockets,
+which allows the framework to simultaneously connect to components running in
+separate browser windows and tabs, or even
+in different browsers altogether (such as opening a terminal in Chrome and an IDE in Firefox).

 \subsection{Interprocess Communication}

 To handle communication with processes running inside the container TFW utilizes
-the asynchronous distributed messaging library ZeroMQ%
+the asynchronous distributed messaging called library ZeroMQ%
 \footnote{\href{http://zeromq.org}{http://zeromq.org}} or ZMQ as short.
 The rationale behind this is that unlike other messaging systems such as
 RabbitMQ%
 \footnote{\href{https://www.rabbitmq.com}{https://www.rabbitmq.com}} or Redis%
 \footnote{\href{https://redis.io}{https://redis.io}},
-ZMQ does not require a daemon (message broker process) and as such
-has a much lower memory footprint while still providing various messaging
+ZMQ does not require a message broker daemon to be running in the background at all times
+and as such has a much lower memory footprint while still providing various messaging
 patterns and bindings for almost any widely used programming language.
 An other --- yet untilized --- capability of this solution is that since ZMQ is capable
 of using simple TCP sockets, we could even communicate with processes running on remote
-hosts using the framework.
+hosts using the current architecture of the framework.

 There are various lower level and higher level alternatives for IPC other than
-ZMQ which were also considered during the desing process of the framework at some point.
+ZMQ which were also considered during the design process of the framework at some point.
 A few examples of top contenders and reasons for not using them in the end:
 \begin{itemize}
    \item The handling of raw TCP sockets would involve lot's of boilerplate logic that
    already have quality implementations in messaging libraries: i.e.\ making sure that
-    all bytes are sent or received both require checking the return values of the
-    libc \code{send()} and \code{recv()} system calls, while ZMQ takes care of this
+    all bytes are sent or received both require constantly checking the return values of the
+    libc \code{send()} and \code{recv()} system calls%
+\footnote{Developers forget this very often, resulting in almost untraceable bugs
+that seem to occour randomly},
+    while ZMQ takes care of this
    extra logic involved and even provides higher level messaging patterns such as
    subscribe-publish, which would need to be implemented on top of raw sockets again.
    \item Using something like gRPC\footnote{\href{https://grpc.io}{https://grpc.io}}
@ -95,11 +112,15 @@ A few examples of top contenders and reasons for not using them in the end:
    which would make the framework
    less lightweight and flexible: socket communication with or without ZMQ does not
    force you to write synchronous or asynchronous code, whereas common HTTP servers
-    are either async or pre-fork in nature, which extort certain design choices on code
+    are either async%
+\footnote{Async servers use the \code{select} or \code{epoll} system calls among others
+to avoid blocking on IO} or pre-fork%
+\footnote{Pre-fork servers spawn multiple processes and threads to handle requests
+simultaneously} in nature, which extorts certain design choices on code
    built on them.
 \end{itemize}

-\section{High Level Overview}
+\section{Architectural Overview}

 Now being familiar with the technological basis of the framework we can now
 discuss it in more detail.
@ -116,11 +137,11 @@ Architecturally TFW consists of four main components:
          that is implemented as an event handler called \code{FSMManagingEventHandler}
 \end{itemize}
 Note that it is important to keep in mind that as I've mentioned previously,
-the TFW Server and event handlers reside in the \code{solvable} Docker container.
-They all run in separate processes and only communicate using ZeroMQ sockets.
+the TFW server and event handlers reside in the \code{solvable} Docker container.
+They all run in separate processes and only communicate with eachother using ZeroMQ sockets.

 In the following sections I am going to explain each of the main components in
-greater detail, as well as how they interact with each other,
+greater detail, as well as how they interact with eachother,
 their respective responsibilities,
 some of the design choices behind them and more.

@ -149,7 +170,10 @@ Let's inspect further what a valid TFW message might look like:

 All valid messages \emph{must} include a \code{key} field as this is used by the
 framework for addressing: event handlers and frontend components subscribe to one
-or more \code{key}s and only receive messages with \code{key}s they have
+or more of these \code{key}s and only receive%
+\footnote{In reality they do receive them, just like how network interfaces receive all
+ethernet frames, they just choose ignore the ones not concerning them}
+messages with \code{key}s that they have
 subscribed to.
 It is possible to send a message with an empty key, however these messages will not
 be forwarded by the TFW server (but will reach it, so in case the target of a message
@ -165,12 +189,12 @@ at a later point in this paper.
 The default behaviour of the TFW server is that it forwards all messages from coming from
 the frontend to the event handlers and vice versa.
 So messages coming from the WebSockets of the frontend are forwarded to event handlers
-via ZMQ and messages received through ZMQ from event handlers are forwarded to
+via ZMQ and messages received on ZMQ from event handlers are forwarded to
 the frontend via WebSockets.

 The TFW server is also capable of ``reflecting'' messages back to the side they were
-received on (to faciliate event handler to event handler for instance), or broadcast
-messages to all components.
+received from (to faciliate event handler to event handler communication for instance),
+or broadcast messages to all components.
 This is possible by embedding a whole TFW message in the \code{data} field of
 an outer wrapper message with a special \code{key} that signals to the TFW server that
 this message requires special attention.
@ -181,7 +205,7 @@ An example of this would be:
    "data":
    {
        ...
-        The message you want to broadcast or mirror
+        The whole message you want to broadcast or mirror
        (with it's own "key" and "data" fields)
        ...
    }
@ -198,7 +222,7 @@ As discussed earlier, using ZeroMQ allows developers to implement event handlers
 in a wide variety of programming languages.
 This is very important for the framework, as content creators often create
 challenges that are very specific to a language, for example the showcasing
-of a security vulnerability in an older version of Java.
+of a security vulnerability in an older version of the Java standard library.

 These event handlers are used to write most of the code developers wish to
 integrate with the framework.
@ -210,11 +234,20 @@ based on this knowledge.
 An event handler such as this could be invoked by sending a message to it
 at any time when the running of the tests would be required.

+An interesting thing to mention is that there \emph{could} be event handlers which
+broadcast messages with a \code{key} that they are also subscribed to.
+This can distrupt their behaviour in weird ways if they are not prepared to
+deal with their own ``echoes''.
+The framework offers a solution for this by providing a special
+event handler type, which is capable of filtering out it's own broadcasts.
+The way they do this is by caching the checksum of every message they broadcast,
+and ignore the first message that comes back with the same checksum.
+
 \subsection{Frontend}

 This is a web application that runs in the browser of the user and uses
-multiple WebSocket connections to connect to the TFW server.
-Due to rapidly increasing complexity the original implementation (written in
+multiple WebSockets to connect to the TFW server.
+Due to rapidly increasing complexity, the original implementation (written in
 plain JavaScript with jQuery%
 \footnote{\href{https://jquery.com}{https://jquery.com}} and Bootstrap%
 \footnote{\href{https://getbootstrap.com}{https://getbootstrap.com}}) was becoming
@ -234,7 +267,7 @@ Other reasons included that the frontend of the Avatao platform is also written
 in Angular (bonus points for experienced team members in the company).
 An other good thing going for it is that Angular forces you to use TypeScript%
 \footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}}
-which tries to remedy the issues\cite{JavaScript}
+which tries to remedy some of the issues\cite{JavaScript}
 with JavaScript by being a language that transpiles to JavaScript while
 strongly encouraging things like static typing or Object Oriented Principles.

@ -244,11 +277,11 @@ strongly encouraging things like static typing or Object Oriented Principles.

 A good chunk of the framework codebase is a bunch of pre-made, built-in components
 that implement commonly required functionality for developers to use.
-These components usually involve an event handler and an Angular component which
-communicates with it to realize some functionality.
+These components usually involve an event handler and an Angular component
+communicating with eachother to realize some sort of functionality.
 An example would be the built-in code editor of the framework
-(visible on the left side of Figure~\ref{figures/tfw_frontend.png}).
-This code editor is essentially a Monaco editor%
+(visible on the right side of Figure~\ref{figures/tfw_frontend.png}).
+This code editor essentially is a Monaco editor%
 \footnote{\href{https://microsoft.github.io/monaco-editor/}
 {https://microsoft.github.io/monaco-editor/}}
 instance integrated into Angular and upgraded with the capability to
@ -256,21 +289,23 @@ exchanges messages with an event handler to save, read and edit files
 that reside in the writeable file system of the \code{solvable}
 Docker container.

-All of the built-ins come with full API documentation explaining what they do
-on receiving specific messages, and what messages they emit on different events.
+All of the built-ins come with a full API documentation explaining what they do
+on receiving specific messages, and what kind of messages they may emit on different events.
 This greatly expands the capabilities of the framework, since it allows
 developers to do things including, but not limited to:
 \begin{itemize}
    \item making the code editor automatically appear in sections
-          of the tutorial where the user needs to use it
+          of the tutorial where the user needs to use it, then disappear
+          when it is no longer needed to conserve space
    \item inject commands into the user's terminal
-    \item hook into messages emitted from components to detect events, such as
+    \item hook callbacks to run code on messages emitted from components to
+          detect events, such as
          to detect if the user has clicked a button or executed a command
          in the terminal
-    \item monitor the logs (stdout or stderr) of a given process
+    \item monitor the logs (stdout or stderr) of a given process in real time
 \end{itemize}
 Every pre-made component is designed with the mindset to allow flexible
-and creative usage by developers, with the possibility of future extensions.
+and creative usage by developers, with the added possibility of future extensions.
 Often when developers require certain new features, they open an issue on
 the git repository of the framework for me to review and possibly implement
 later.
@ -279,18 +314,22 @@ One example would be when a developer wanted to automatically advance the tutori
 when the user has entered a specific string into a file.
 This one didn't even require a new feature: I recommended him to implement an event
 handler listening to the messages of the built-in file editor, filter the messages
-which contain file content that is going to be written to disk, and simply
+which contain file content that is being sent to be written to disk, and simply
 search these messages for the given string.

 The exact capabilities of these built-in components will be explained in greater
-detail in a later chapter.
+detail in Chapter~\ref{atouroftfw}.
+Developers who are well-aware of these capabilites are able to use the framework in extremely
+creative ways allowing for very interesting functionality, such as the above example.
+The components of TFW can often be combined to work together in unexpected, yet useful
+ways, similarly how command-line utilities on UNIX-like systems do.

 \subsection{TFW Finite State Machine}

 An important requirement we have specified during~\ref{requirements} was that
 the framework must be capable of tracking user progress.
 TFW allows developers to define a \emph{finite state machine}
-which is capable of describing the desired ``story'' of a tutorial.
+which is capable of describing the desired ``story'' of a learning exercise.
 The states of the machine could be certain points in time during the completion of the
 tutorial envisioned and transitions could be events that influence the
 state, such as the editing of files, execution of commands and so on.
@ -301,23 +340,25 @@ Take the fixing of a SQL Injection%
 vulnerability as an example.
 Let's assume, that the source code is vulnerable to a SQL injection attack
 because it tries to compose a query with string concatenation instead of
-using a parameterized query provided by the database library.
+using a prepared statement provided by the database library.
 A challenge developer could implement an FSM in the framework that looks like this:

-\pic[width=.6\textwidth]{figures/tfw_fsm.png}{An Example for a Finite State Machine in TFW}
+\pic[width=.6\textwidth]{figures/tfw_fsm.png}{An example for a finite state machine in TFW}

 In case the source file has been edited, the unit test cases designed to detect
 whether the code is vulnerable or not are invoked.
 Depending on the results three cases are possible:

 \begin{description}
-    \item[All test cases have succeeded:] If all the tests succeeded then the user has managed
+    \item[All test cases have succeeded:] If all the tests cases have ran successfully,
+    then the user has managed
    to fix the code properly and we can display a congratulating message accordingly.
-    \item[All test cases have failed:] In this case the solution is incorrect
-    and we can offer some hints.
+    \item[All test cases have failed:] In this case the submitted solution is incorrect
+    and we should offer some hints, so that the user can try again more effectively,
+    optionally displaying more and more hints with each successive failure.
    \item[Some test cases have succeeded:] It is possible that the based on the test cases
-    that have succeeded and failed we can determine that the user tried to blacklist
-    certain SQL keywords. This is a common, but incorrect solution of fixing a SQL
+    that have succeeded and failed we can determine that the user has tried to blacklist
+    certain SQL keywords. This is a common, but incorrect ``solution'' of fixing a SQL
    injection vulnerability. Now we can explain to users why their solution is wrong,
    and give them helpful tips.
 \end{description}
@ -330,10 +371,11 @@ This is a very engaging feature that offers an immersive learning experience for
 users, which many solutions for distance education lack so often.

 Developers can use a YAML file or write Python code to implement finite
-state machines.
-In state machine implementations it is possbile to subscribe callbacks to be
+state machines in TFW\@. This is going to be further detailed in
+Chapter~\ref{usingtfw}.
+In the implementation of state machines it is also possbile to subscribe callbacks to be
 invoked on certain events regarding the machine, such as before and after
-state transitions, or onentering and exiting a state.
+state transitions, or on entering and exiting a state.
 It is \emph{very} important to be aware of these callbacks, as much of the
 actual tutorial logic is often going to be implemented in these.

@ -351,22 +393,28 @@ The \code{trigger} field of a message can be used to step the framework FSM
 if all preconditions are met.
 The way this works is if the TFW server encounters a message with a
 \code{trigger} defined, it notifies the event handler managing
-the state machine.
+the state machine so it can attempt activating said \code{trigger}.

-Since messages can come from unauthenticated sources, it is possible to
+Since messages in the system can come from unauthenticated sources (such as the frontend),
+it is possible to
 enforce the authentication of privileged messages, such as messages containing a \code{trigger}.
-The framework allows trusted code to access a cryptographic key on the file system, which
+The framework allows trusted code to access a cryptographic key stored on the file system
+with proper permissions, which
 can be used to digitally sign messages (this is what the \code{signature} message
-field is designed for).
-In this case the TFW server will only forward privileged messages that
-have a valid signature.
+field is designed for) using HMAC%
+\footnote{Hash-based message authentication code}.
+In this case the TFW server will only forward the privileged messages that
+have a valid signature, and the evend handler managing the state machine
+will also validate the signature of messages it receives
+(and sign the updates it broadcasts as well, so that other components can verify that
+they come from a trusted source).

 \subsection{Solution checking}\label{solutioncheck}

 Traditionally most challenges on the Avatao platform implement a Docker image called
 \code{controller}, which is responsible for detecting the successful
 solution of a challenge.
-When using the Tutorial Framework a pre-implemented \code{controller}
+When using the Tutorial Framework, a pre-implemented \code{controller}
 image is available, which listens to messages emitted by the
 framework FSM, and detects if the final state defined by developers is reached.
 This means that if content creators implement a proper FSM, the solution checking
@ -378,4 +426,5 @@ traditional hacking challenges, such as exercises developed for CTF%
 \footnote{A ``capture the flag'' game is a competition designed for professionals
 --- or just people interested in the field ---  to sharpen their skills in IT security.
 Avatao often organises similar events.}
-events.
+events, as the controller image is also capable of verifying the authenticity of
+FSM update messages via inspecting their signatures.