Continue writig thesis with focus on arctitecture

2018-11-30 19:35:49 +01:00
parent 65e6426fdc
commit 1ef4feb146
7 changed files with 266 additions and 19 deletions
--- a/bibliography.bib
+++ b/bibliography.bib
@ -80,7 +80,42 @@
    title={Education as a key factor in the process of building cybersecurity},
    url={https://2017.cybersecforum.eu/files/2016/12/ecj_vol2_issue1_i.albrycht_education_as_a_key_in_the_process_of_building_cybersecurity.pdf},
    language={english},
-    author={IZABELA ALBRYCHT},
+    author={Izabela Albrycht},
    year={2016},
 }

+@online{EBayGit,
+    title={Pwning eBay - How I Dumped eBay Japan's Website Source Code},
+    url={https://slashcrypto.org/2018/11/28/eBay-source-code-leak/},
+    language={english},
+    author={David Wind},
+    year={2018},
+    month=nov,
+}
+
+@online{CloudFlareLeak,
+    title={Incident report on memory leak caused by Cloudflare parser bug},
+    url={https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/},
+    language={english},
+    author={John Graham-Cumming},
+    year={2017},
+    month=feb,
+}
+
+@online{NoPerfectSecurity,
+    title={The Illusion Of Perfect Cybersecurity},
+    url={https://www.forbes.com/sites/forbestechcouncil/2018/03/27/the-illusion-of-perfect-cybersecurity/},
+    language={english},
+    author={George Finney},
+    year={2018},
+    month=mar,
+}
+
+@online{JavaScript,
+    title={JavaScript is a Dysfunctional Programming Language},
+    url={https://medium.com/javascript-non-grata/javascript-is-a-dysfunctional-programming-language-a1f4866e186f},
+    language={english},
+    author={Richard Kenneth Eng},
+    year={2016},
+    month=mar,
+}
--- a/content/architecture.tex
+++ b/content/architecture.tex
@ -0,0 +1,157 @@
+\chapter{Framework Architecture}
+\section{Core Technology}
+
+It is important to understand that the Tutorial Framework is currently implemented as
+two Docker images:
+\begin{itemize}
+    \item the \texttt{solvable} image is responsible for running the framework and the client
+          code depending on it
+    \item the \texttt{controller} image is responsible for solution checking (to figure out
+          whether the user completed the tutorial or not)
+\end{itemize}
+During most of this capter I am going to be discussing the \texttt{solvable} Docker image,
+with the exception of section \ref{solutioncheck}, where I will dive into how the
+\texttt{controller} image is implemented.
+
+The most important feature of the framework is it's messaging system.
+Basically what we need is a system where processes running inside a Docker container
+would be allowed to communicate with eachother. 
+This is easy with lots of possible solutions (named pipes, sockets or shared memory to name a few).
+The hard part is that frontend components running inside a web browser -- which could be
+potentially on the other side of the planet -- would also need to partake in said communication.
+So what we need to create is something of a hybrid between an IPC system and something
+that can communicate with JavaScript running in a browser connected to it.
+The solution the framework uses is a proxy server, which connects to frontend components
+on one side and handles interprocess communication on the other side. 
+This way the server is capable of proxying messages between the two sides, enabling
+communitaion between them.
+Notice that this way what we have is essentially an IPC system in which a web application
+can ``act like'' it was running on the backend in a sense: it is easily able to
+communicate with processes on the backend, while in reality the web application
+runs in the browser of the user, on a completely different machine.
+
+\begin{note}
+The core idea and initial implementation of this server comes from Bálint Bokros,
+which was later redesigned and fully rewritten by me to allow for greater flexibility
+(such as connecting to more than a single browser at a time, different messaging modes,
+message authentication, restoration of frontend state, a complete overhaul of the
+state tracking system and the possibility for solution checking among other things).
+If you are explicitly interested in the differences between the original POC implementation
+(which is out of scope for this thesis due to lenght constraints) and the current
+framework please consult Bálint's excellent paper and Bachelor's Thesis on it\cite{BokaThesis}.
+\end{note}
+
+Now let us take a closer look:
+
+\subsection{Connecting to the Frontend}
+
+The old way of creating dynamic webpages was AJAX polling, which is basically sending
+HTTP requests to a server at regular intervals from JavaScript to update the contents
+of your website (and as such requiring to go over the whole TCP handshake and the
+HTTP request-response on each update).
+This has been superseded by WebSockets around 2011, which provide a full-duplex
+communication channel over TCP between your browser and the server.
+This is done by initiation a protocol handshake using the \texttt{Connection: Upgrade}
+HTTP header, which establishes a premanent socket connection between the browser
+and the server.
+This allows for communication with lower overhead and latency facilitating efficient
+real-time applications.
+
+The Tutorial Framework uses WebSockets to connect to it's web frontend.
+The framework proxy server is capable to connecting to an arbirary number of websockets,
+which allows opening different components in separate browser windows and tabs, or even
+in different browsers at once (such as opening a terminal in Chrome and an IDE in Firefox).
+
+\subsection{Interprocess Communication}
+
+To handle communication with processes running inside the container TFW utilizes
+the asynchronous distributed messaging library ZeroMQ%
+\footnote{\href{http://zeromq.org}{http://zeromq.org}} or ZMQ as short.
+The rationale behind this is that unlike other messaging systems such as
+RabbitMQ%
+\footnote{\href{https://www.rabbitmq.com}{https://www.rabbitmq.com}} or Redis%
+\footnote{\href{https://redis.io}{https://redis.io}},
+ZMQ does not require a daemon (message broker process) and as such
+has a much lower memory footprint while still providing various messaging
+patterns and bindings for almost any widely used programming language.
+An other -- yet untilized -- capability of this solution is that since ZMQ is capable
+of using simple TCP sockets, we could even communicate with processes running on remote
+hosts using the framework.
+
+There are various lower level and higher level alternatives for IPC other than
+ZMQ which were also considered during the desing process of the framework at some point.
+A few examples of top contenders and reasons for not using them in the end:
+\begin{itemize}
+    \item The handling of raw TCP sockets would involve lot's of boilerplate logic that
+    already have quality implementations in messaging libraries: i.e. making sure that
+    all bytes are sent or received both require checking the return values of the
+    libc \texttt{send()} and \texttt{recv()} system calls, while ZMQ takes care of this
+    extra logic involved and even provides higher level messaging patterns such as
+    subscribe-publish, which would need to be implemented on top of raw sockets again.
+    \item Using something like gRPC%
+    \footnote{\href{https://grpc.io}{https://grpc.io}} or plain HTTP (both of which
+    are considered to be higher level than ZMQ sockets) would require 
+    all processes partaking in the communication to be HTTP servers themselves,
+    which would make the framework
+    less lightweight and flexible: socket communication with or without ZMQ does not
+    force you to write synchronous or asynchronous code, whereas common HTTP servers
+    are either async or pre-fork in nature, which extort certain design choices on code
+    built on them.
+\end{itemize}
+
+\section{High Level Overview}
+
+Now being familiar with the technological basis of the framework we can now
+discuss it in more detail.
+
+\pic{figures/tfw_architecture.png}{An overwiew of the Tutorial Framework}
+
+Architecturally TFW consists of four main components:
+\begin{itemize}
+    \item \textbf{Event handlers}: processes running in a Docker container
+    \item \textbf{Frontend}: web application running in the browser of the user
+    \item \textbf{TFW (proxy) server}: responsible for message routing/proxying
+          between the frontend and event handlers
+    \item \textbf{TFW FSM}: a finite state machine responsible for tracking user progress,
+          that is implemented as an event handler called \texttt{FSMManagingEventHandler}
+\end{itemize}
+Note that it is important to keep in mind that as I've mentioned previously,
+the TFW Server and event handlers reside in the \texttt{solvable} Docker container.
+They all run in separate processes and only communicate using ZeroMQ sockets.
+
+In the following sections I am going to explain each of the main components in
+greater detail, as well as how they interact with each other,
+their respective responsibilities,
+some of the design choices behind them and more.
+
+\subsection{Frontend}
+
+This is a web application that runs in the browser of the user and uses
+multiple WebSocket connections to connect to the TFW server.
+Due to rapidly increasing complexity the original implementation (written in
+plain JavaScript with jQuery%
+\footnote{\href{https://jquery.com}{https://jquery.com}} and Bootstrap%
+\footnote{\href{https://getbootstrap.com}{https://getbootstrap.com}}) was becoming
+unmaintainable and the usage of some frontend framework became justified.
+
+Several choices were considered, with the main contenders being:
+\begin{itemize}
+    \item Angular\footnote{\href{https://angular.io}{https://angular.io}}
+    \item React\footnote{\href{https://reactjs.org}{https://reactjs.org}}
+    \item Vue.js\footnote{\href{https://vuejs.org}{https://vuejs.org}}
+\end{itemize}
+After comparing the above frameworks we've decided to work with Angular for
+several reasons.
+One being that Angular is essentially a complete platform that is very well
+suitable for building complex architecture into a single page application.
+Other reasons included that the frontend of the Avatao platform is also written
+in Angular (bonus points for experienced team members in the company).
+An other good thing going for it is that Angular forces you to use TypeScript%
+\footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}}
+which tries to remedy the issues\cite{JavaScript}
+with JavaScript by being a language that transpiles to JavaScript while
+strongly encouraging things like static typing or Object Oriented Principles.
+
+\subsection{Messaging}
+\subsection{TFW Finite State Machine}
+\subsection{Solution Checking}\label{solutioncheck}
--- a/content/introduction.tex
+++ b/content/introduction.tex
@ -21,9 +21,16 @@ a new age of digital wild west, which could involve us running around in vulnera
 driving cars\cite{SelfDriving} with power over life and death, while exposing all our
 sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}.
 What a time to be alive.
-Unless we want to disconnect all our devices from all networks and ban USB sticks, the best
-lines of defense are going to be people -- a new generation of \emph{security conscious}
-users and developers.
+It is important to express that IT security is something that is \emph{really hard} to
+get right.
+Even if right often only means better then your neighbour, as perfect security is an utopia
+that doesn't seem to exist\cite{NoPerfectSecurity}.
+Often when large and reputable companies in the industry such as
+CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times
+is when people start to grasp how difficult it actually is.
+This is why unless we want to disconnect all our devices from all networks and ban USB
+sticks, the best lines of defense are going to be people -- a new generation 
+of \emph{security conscious} users and developers.

 Among many other things outside IT, this is only possible with education\cite{ITSecEdu}.
 We need to come up with engaging, addictive and fun ways to learn (and teach), so that
@ -35,10 +42,10 @@ The only thing we can hope and work for is to become better and better as time
 and generations pass.
 We \emph{must} do better, and education is the way forward.

-The short term goal of this project -- and thesis -- is to provide a new angle
-in the education of software engineering, especially secure software engineering
-based on the aspirations above, with the long term goal of bringing something new
-to the table in the matter of IT education as a whole
+The short term goal of this project -- and the goal of this thesis -- is to provide
+a new angle in the education of software engineering, especially secure software
+engineering based on the aspirations above, with the long term goal of bringing
+something new to the table in the matter of IT education as a whole
 (not just developers, but users as well).

 \section{A Short Introduction to Avatao}
@ -46,7 +53,7 @@ to the table in the matter of IT education as a whole
 The goal of Avatao as a company is to help software developers in building a \emph{culture} of
 security amongst themselves, with the vision that if the world is going to be taken over by
 software no matter what, that software might as well be \emph{secure software}.
-To achieve this goal we have been working on an online e-learning platform with hundreds\
+To achieve this goal we have been working on an online e-learning platform with hundreds%
 \footnote{654 exercises as of today, to be exact}
 of hands-on learning exercises to help students and professionals
 master IT security, collaborating with
@ -69,6 +76,8 @@ added authenticity and relevance \cite{AkosFacebook}.
 Our challenges usually involve some sort of website acting as frontend for the vulnerable
 application, or require the user to connect using SSH.

+\pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform}
+
 The Avatao platform relies heavily on Docker containers to spawn challenges,
 which makes it extremely flexible in terms of what is possible to do when creating
 content.
@ -87,7 +96,7 @@ things like exercises involving the use of Docker or Windows based challenges.
 \section{Emergence}

 While working as a content creator I have stumbled into the idea of automating the completion
-of challenges for QA\footnote{Quality Assurrance} and demo purposes\
+of challenges for QA\footnote{Quality Assurrance} and demo purposes%
 \footnote{I used to record short videos or GIFs to showcase my content to management}.
 In a certain scenario I was required to integrate a web based terminal emulator in a
 frontend application to improve user experience by making it possible to use a shell
@ -96,18 +105,19 @@ After I got this working I was looking into writing hacky bash scripts to automa
 required to complete the challenge in order to make it easier for me to record the solution,
 as I have often found myself recording over and over again for a demo without any mistakes.
 During the time I was playing around with this idea, researching possible solutions have led me
-to a hidden gem of a project on GitHub called \texttt{demo-magic}\
+to a hidden gem of a project on GitHub called \texttt{demo-magic}%
 \footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}},
 which is esentially a bash script that simulates someone typing into a terminal and executing
 commands.
-I have created a fork\
-\footnote{The source code is available at
-\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}}
+I have created a fork%
+\footnote{
+\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}
+{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}}
 of the project and integrated it into my challenge.
 Soon after recording demo videos was not even necessary anymore, as I have started to distribute
 the solution script with the challenge code itself, making it toggleable using build-time
 variables.
-Should the solution script be enabled, the challenge would automatically start\
+Should the solution script be enabled, the challenge would automatically start%
 \footnote{I did this by injecting the solution script into the user's \texttt{.bashrc} file}
 completing itself in the terminal integrated into it's frontend, often even explaining the
 commands executed during the solution process.
@ -123,7 +133,7 @@ but what I did not know was that I have accidentally
 did something far more than a hacky bash script solving challenges, as this little script
 would help formulate the idea of the project \emph{Tutorial Framework} or just \emph{TFW}.

-\section{Introducing the Tutorial Framework}
+\section{Vision of the Tutorial Framework}

 The whole ''challenges that solve themselves'' thing seemed like an idea that has great
 potential if developed further.
@ -141,7 +151,7 @@ your newfound skills in action immediately.

 For example a chatbot would show you how to encrypt a file using GnuGP,
 then it would ask you to encrypt an other file similarly.
-After this the bot could show you how to a configure a database server and then
+After this the bot could teach you how to a configure a database server and then
 ask you to write a configuration file yourself and then encrypt it because it might
 contain sensitive data such as open ports, usernames and such.

@ -157,6 +167,28 @@ a web based frontend with a file editor, terminal, chat window and stuff like th
 Turns out that today all this can be done by writing a few hundred lines of Python
 code which uses the Tutorial Framework.

+\subsection{Project Requirements}\label{requirements}
+
+Based on this it is now more or less possible to define requirements for the project.
+The reason for the ``more or less'' part is that all of this is pretty much bleeding edge,
+where the requirements could shift dynamically with time.
+For this reason I am going to be as general as possible, to the point that some of
+this might even sound vauge.
+To achieve our goals we would need:
+
+\begin{itemize}
+    \item a way to keep track of user progress
+    \item a way to to handle various events (i.e. we can react when
+          the user has edited a file, or has executed a command in the terminal)
+    \item a highly flexible messaging system, in which processes and
+          frontend components (running in a web browser) could communicate with eachother
+    \item a web based frontend with lots of built-in options (terminal, file editor, chat
+          window, etc.) that use said messaging system
+    \item stable APIs that can be exposed to content creators to work with (so that
+          framework updates won't break client code)
+    \item tooling for development (distributing, building and running)
+\end{itemize}
+
 \section{Early Development}

 Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company
@ -174,9 +206,27 @@ Bachelor's Thesis\cite{BokaThesis}.
 Although not much of the original code base has remained due to intense refactoring
 and all around changes, the result would serve as a solid foundation for further development,
 and the architecture is mostly the same to this day.
-The resulting code would be the first working POC\
+The resulting code would be the first working POC%
 \footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection
 attack.
+This initial version included the foundations of the framework:
+a working messaging system, event handling and state tracking.
+These provided a great basis
+despite of the fact that the core codebase of the framework was almost
+completely rewritten due to an increased focus on code quality, 
+extensibility and API stability required by new features.
+
+It is interesting to note, that when I've mentioned that the project requirements
+were kept general on purpose (\ref{requirements}) I had good reason to do so.
+When taking a look at the requirements of Bálint's Thesis, much of that
+is completely obsolete by now.
+But since the project has followed Agile Methodology%
+\footnote{Manifesto for Agile Software Development:
+\href{https://agilemanifesto.org}{https://agilemanifesto.org}}
+from the start, we were able to adapt to these changes without losing
+the progess he made in said Thesis. Quoting from the Agile Manifesto: 
+``Responding to change over following a plan''.
+This is a really important takeaway.

 After becoming a full time employee at Avatao I was tasked with developing the project
 with Bálint, who was later reassigned to work on the GDPR compliance of the platform.
--- a/figures/avatao_challenge.png
+++ b/figures/avatao_challenge.png
--- a/figures/tfw_architecture.png
+++ b/figures/tfw_architecture.png
--- a/latexplate.sty
+++ b/latexplate.sty
@ -10,7 +10,8 @@
    sectsty,
    xcolor,
    microtype,
-    tabto
+    tabto,
+    amsthm
 }
 \RequirePackage[bottom,hang,flushmargin]{footmisc}

@ -18,6 +19,8 @@
 \sethlcolor{andigray}
 \newcommand{\code}[1]{\hl{\mbox{#1}}}

+\newtheorem*{note}{Note}
+
 \newcommand{\pic}[3][width=\textwidth]
 {
 \begin{figure}[H]
--- a/thesis.tex
+++ b/thesis.tex
@ -41,7 +41,9 @@
 \include{content/declaration}
 \include{content/abstract}
 \include{content/introduction}
+\include{content/architecture}

+\listoffigures
 \lstlistoflistings

 \renewcommand\bibname{References}