Continue writig thesis with focus on arctitecture

This commit is contained in:
Kristóf Tóth 2018-11-30 19:35:49 +01:00
parent 65e6426fdc
commit 1ef4feb146
7 changed files with 266 additions and 19 deletions

View File

@ -80,7 +80,42 @@
title={Education as a key factor in the process of building cybersecurity},
url={https://2017.cybersecforum.eu/files/2016/12/ecj_vol2_issue1_i.albrycht_education_as_a_key_in_the_process_of_building_cybersecurity.pdf},
language={english},
author={IZABELA ALBRYCHT},
author={Izabela Albrycht},
year={2016},
}
@online{EBayGit,
title={Pwning eBay - How I Dumped eBay Japan's Website Source Code},
url={https://slashcrypto.org/2018/11/28/eBay-source-code-leak/},
language={english},
author={David Wind},
year={2018},
month=nov,
}
@online{CloudFlareLeak,
title={Incident report on memory leak caused by Cloudflare parser bug},
url={https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/},
language={english},
author={John Graham-Cumming},
year={2017},
month=feb,
}
@online{NoPerfectSecurity,
title={The Illusion Of Perfect Cybersecurity},
url={https://www.forbes.com/sites/forbestechcouncil/2018/03/27/the-illusion-of-perfect-cybersecurity/},
language={english},
author={George Finney},
year={2018},
month=mar,
}
@online{JavaScript,
title={JavaScript is a Dysfunctional Programming Language},
url={https://medium.com/javascript-non-grata/javascript-is-a-dysfunctional-programming-language-a1f4866e186f},
language={english},
author={Richard Kenneth Eng},
year={2016},
month=mar,
}

157
content/architecture.tex Normal file
View File

@ -0,0 +1,157 @@
\chapter{Framework Architecture}
\section{Core Technology}
It is important to understand that the Tutorial Framework is currently implemented as
two Docker images:
\begin{itemize}
\item the \texttt{solvable} image is responsible for running the framework and the client
code depending on it
\item the \texttt{controller} image is responsible for solution checking (to figure out
whether the user completed the tutorial or not)
\end{itemize}
During most of this capter I am going to be discussing the \texttt{solvable} Docker image,
with the exception of section \ref{solutioncheck}, where I will dive into how the
\texttt{controller} image is implemented.
The most important feature of the framework is it's messaging system.
Basically what we need is a system where processes running inside a Docker container
would be allowed to communicate with eachother.
This is easy with lots of possible solutions (named pipes, sockets or shared memory to name a few).
The hard part is that frontend components running inside a web browser -- which could be
potentially on the other side of the planet -- would also need to partake in said communication.
So what we need to create is something of a hybrid between an IPC system and something
that can communicate with JavaScript running in a browser connected to it.
The solution the framework uses is a proxy server, which connects to frontend components
on one side and handles interprocess communication on the other side.
This way the server is capable of proxying messages between the two sides, enabling
communitaion between them.
Notice that this way what we have is essentially an IPC system in which a web application
can ``act like'' it was running on the backend in a sense: it is easily able to
communicate with processes on the backend, while in reality the web application
runs in the browser of the user, on a completely different machine.
\begin{note}
The core idea and initial implementation of this server comes from Bálint Bokros,
which was later redesigned and fully rewritten by me to allow for greater flexibility
(such as connecting to more than a single browser at a time, different messaging modes,
message authentication, restoration of frontend state, a complete overhaul of the
state tracking system and the possibility for solution checking among other things).
If you are explicitly interested in the differences between the original POC implementation
(which is out of scope for this thesis due to lenght constraints) and the current
framework please consult Bálint's excellent paper and Bachelor's Thesis on it\cite{BokaThesis}.
\end{note}
Now let us take a closer look:
\subsection{Connecting to the Frontend}
The old way of creating dynamic webpages was AJAX polling, which is basically sending
HTTP requests to a server at regular intervals from JavaScript to update the contents
of your website (and as such requiring to go over the whole TCP handshake and the
HTTP request-response on each update).
This has been superseded by WebSockets around 2011, which provide a full-duplex
communication channel over TCP between your browser and the server.
This is done by initiation a protocol handshake using the \texttt{Connection: Upgrade}
HTTP header, which establishes a premanent socket connection between the browser
and the server.
This allows for communication with lower overhead and latency facilitating efficient
real-time applications.
The Tutorial Framework uses WebSockets to connect to it's web frontend.
The framework proxy server is capable to connecting to an arbirary number of websockets,
which allows opening different components in separate browser windows and tabs, or even
in different browsers at once (such as opening a terminal in Chrome and an IDE in Firefox).
\subsection{Interprocess Communication}
To handle communication with processes running inside the container TFW utilizes
the asynchronous distributed messaging library ZeroMQ%
\footnote{\href{http://zeromq.org}{http://zeromq.org}} or ZMQ as short.
The rationale behind this is that unlike other messaging systems such as
RabbitMQ%
\footnote{\href{https://www.rabbitmq.com}{https://www.rabbitmq.com}} or Redis%
\footnote{\href{https://redis.io}{https://redis.io}},
ZMQ does not require a daemon (message broker process) and as such
has a much lower memory footprint while still providing various messaging
patterns and bindings for almost any widely used programming language.
An other -- yet untilized -- capability of this solution is that since ZMQ is capable
of using simple TCP sockets, we could even communicate with processes running on remote
hosts using the framework.
There are various lower level and higher level alternatives for IPC other than
ZMQ which were also considered during the desing process of the framework at some point.
A few examples of top contenders and reasons for not using them in the end:
\begin{itemize}
\item The handling of raw TCP sockets would involve lot's of boilerplate logic that
already have quality implementations in messaging libraries: i.e. making sure that
all bytes are sent or received both require checking the return values of the
libc \texttt{send()} and \texttt{recv()} system calls, while ZMQ takes care of this
extra logic involved and even provides higher level messaging patterns such as
subscribe-publish, which would need to be implemented on top of raw sockets again.
\item Using something like gRPC%
\footnote{\href{https://grpc.io}{https://grpc.io}} or plain HTTP (both of which
are considered to be higher level than ZMQ sockets) would require
all processes partaking in the communication to be HTTP servers themselves,
which would make the framework
less lightweight and flexible: socket communication with or without ZMQ does not
force you to write synchronous or asynchronous code, whereas common HTTP servers
are either async or pre-fork in nature, which extort certain design choices on code
built on them.
\end{itemize}
\section{High Level Overview}
Now being familiar with the technological basis of the framework we can now
discuss it in more detail.
\pic{figures/tfw_architecture.png}{An overwiew of the Tutorial Framework}
Architecturally TFW consists of four main components:
\begin{itemize}
\item \textbf{Event handlers}: processes running in a Docker container
\item \textbf{Frontend}: web application running in the browser of the user
\item \textbf{TFW (proxy) server}: responsible for message routing/proxying
between the frontend and event handlers
\item \textbf{TFW FSM}: a finite state machine responsible for tracking user progress,
that is implemented as an event handler called \texttt{FSMManagingEventHandler}
\end{itemize}
Note that it is important to keep in mind that as I've mentioned previously,
the TFW Server and event handlers reside in the \texttt{solvable} Docker container.
They all run in separate processes and only communicate using ZeroMQ sockets.
In the following sections I am going to explain each of the main components in
greater detail, as well as how they interact with each other,
their respective responsibilities,
some of the design choices behind them and more.
\subsection{Frontend}
This is a web application that runs in the browser of the user and uses
multiple WebSocket connections to connect to the TFW server.
Due to rapidly increasing complexity the original implementation (written in
plain JavaScript with jQuery%
\footnote{\href{https://jquery.com}{https://jquery.com}} and Bootstrap%
\footnote{\href{https://getbootstrap.com}{https://getbootstrap.com}}) was becoming
unmaintainable and the usage of some frontend framework became justified.
Several choices were considered, with the main contenders being:
\begin{itemize}
\item Angular\footnote{\href{https://angular.io}{https://angular.io}}
\item React\footnote{\href{https://reactjs.org}{https://reactjs.org}}
\item Vue.js\footnote{\href{https://vuejs.org}{https://vuejs.org}}
\end{itemize}
After comparing the above frameworks we've decided to work with Angular for
several reasons.
One being that Angular is essentially a complete platform that is very well
suitable for building complex architecture into a single page application.
Other reasons included that the frontend of the Avatao platform is also written
in Angular (bonus points for experienced team members in the company).
An other good thing going for it is that Angular forces you to use TypeScript%
\footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}}
which tries to remedy the issues\cite{JavaScript}
with JavaScript by being a language that transpiles to JavaScript while
strongly encouraging things like static typing or Object Oriented Principles.
\subsection{Messaging}
\subsection{TFW Finite State Machine}
\subsection{Solution Checking}\label{solutioncheck}

View File

@ -21,9 +21,16 @@ a new age of digital wild west, which could involve us running around in vulnera
driving cars\cite{SelfDriving} with power over life and death, while exposing all our
sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}.
What a time to be alive.
Unless we want to disconnect all our devices from all networks and ban USB sticks, the best
lines of defense are going to be people -- a new generation of \emph{security conscious}
users and developers.
It is important to express that IT security is something that is \emph{really hard} to
get right.
Even if right often only means better then your neighbour, as perfect security is an utopia
that doesn't seem to exist\cite{NoPerfectSecurity}.
Often when large and reputable companies in the industry such as
CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times
is when people start to grasp how difficult it actually is.
This is why unless we want to disconnect all our devices from all networks and ban USB
sticks, the best lines of defense are going to be people -- a new generation
of \emph{security conscious} users and developers.
Among many other things outside IT, this is only possible with education\cite{ITSecEdu}.
We need to come up with engaging, addictive and fun ways to learn (and teach), so that
@ -35,10 +42,10 @@ The only thing we can hope and work for is to become better and better as time
and generations pass.
We \emph{must} do better, and education is the way forward.
The short term goal of this project -- and thesis -- is to provide a new angle
in the education of software engineering, especially secure software engineering
based on the aspirations above, with the long term goal of bringing something new
to the table in the matter of IT education as a whole
The short term goal of this project -- and the goal of this thesis -- is to provide
a new angle in the education of software engineering, especially secure software
engineering based on the aspirations above, with the long term goal of bringing
something new to the table in the matter of IT education as a whole
(not just developers, but users as well).
\section{A Short Introduction to Avatao}
@ -46,7 +53,7 @@ to the table in the matter of IT education as a whole
The goal of Avatao as a company is to help software developers in building a \emph{culture} of
security amongst themselves, with the vision that if the world is going to be taken over by
software no matter what, that software might as well be \emph{secure software}.
To achieve this goal we have been working on an online e-learning platform with hundreds\
To achieve this goal we have been working on an online e-learning platform with hundreds%
\footnote{654 exercises as of today, to be exact}
of hands-on learning exercises to help students and professionals
master IT security, collaborating with
@ -69,6 +76,8 @@ added authenticity and relevance \cite{AkosFacebook}.
Our challenges usually involve some sort of website acting as frontend for the vulnerable
application, or require the user to connect using SSH.
\pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform}
The Avatao platform relies heavily on Docker containers to spawn challenges,
which makes it extremely flexible in terms of what is possible to do when creating
content.
@ -87,7 +96,7 @@ things like exercises involving the use of Docker or Windows based challenges.
\section{Emergence}
While working as a content creator I have stumbled into the idea of automating the completion
of challenges for QA\footnote{Quality Assurrance} and demo purposes\
of challenges for QA\footnote{Quality Assurrance} and demo purposes%
\footnote{I used to record short videos or GIFs to showcase my content to management}.
In a certain scenario I was required to integrate a web based terminal emulator in a
frontend application to improve user experience by making it possible to use a shell
@ -96,18 +105,19 @@ After I got this working I was looking into writing hacky bash scripts to automa
required to complete the challenge in order to make it easier for me to record the solution,
as I have often found myself recording over and over again for a demo without any mistakes.
During the time I was playing around with this idea, researching possible solutions have led me
to a hidden gem of a project on GitHub called \texttt{demo-magic}\
to a hidden gem of a project on GitHub called \texttt{demo-magic}%
\footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}},
which is esentially a bash script that simulates someone typing into a terminal and executing
commands.
I have created a fork\
\footnote{The source code is available at
\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}}
I have created a fork%
\footnote{
\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}
{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}}
of the project and integrated it into my challenge.
Soon after recording demo videos was not even necessary anymore, as I have started to distribute
the solution script with the challenge code itself, making it toggleable using build-time
variables.
Should the solution script be enabled, the challenge would automatically start\
Should the solution script be enabled, the challenge would automatically start%
\footnote{I did this by injecting the solution script into the user's \texttt{.bashrc} file}
completing itself in the terminal integrated into it's frontend, often even explaining the
commands executed during the solution process.
@ -123,7 +133,7 @@ but what I did not know was that I have accidentally
did something far more than a hacky bash script solving challenges, as this little script
would help formulate the idea of the project \emph{Tutorial Framework} or just \emph{TFW}.
\section{Introducing the Tutorial Framework}
\section{Vision of the Tutorial Framework}
The whole ''challenges that solve themselves'' thing seemed like an idea that has great
potential if developed further.
@ -141,7 +151,7 @@ your newfound skills in action immediately.
For example a chatbot would show you how to encrypt a file using GnuGP,
then it would ask you to encrypt an other file similarly.
After this the bot could show you how to a configure a database server and then
After this the bot could teach you how to a configure a database server and then
ask you to write a configuration file yourself and then encrypt it because it might
contain sensitive data such as open ports, usernames and such.
@ -157,6 +167,28 @@ a web based frontend with a file editor, terminal, chat window and stuff like th
Turns out that today all this can be done by writing a few hundred lines of Python
code which uses the Tutorial Framework.
\subsection{Project Requirements}\label{requirements}
Based on this it is now more or less possible to define requirements for the project.
The reason for the ``more or less'' part is that all of this is pretty much bleeding edge,
where the requirements could shift dynamically with time.
For this reason I am going to be as general as possible, to the point that some of
this might even sound vauge.
To achieve our goals we would need:
\begin{itemize}
\item a way to keep track of user progress
\item a way to to handle various events (i.e. we can react when
the user has edited a file, or has executed a command in the terminal)
\item a highly flexible messaging system, in which processes and
frontend components (running in a web browser) could communicate with eachother
\item a web based frontend with lots of built-in options (terminal, file editor, chat
window, etc.) that use said messaging system
\item stable APIs that can be exposed to content creators to work with (so that
framework updates won't break client code)
\item tooling for development (distributing, building and running)
\end{itemize}
\section{Early Development}
Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company
@ -174,9 +206,27 @@ Bachelor's Thesis\cite{BokaThesis}.
Although not much of the original code base has remained due to intense refactoring
and all around changes, the result would serve as a solid foundation for further development,
and the architecture is mostly the same to this day.
The resulting code would be the first working POC\
The resulting code would be the first working POC%
\footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection
attack.
This initial version included the foundations of the framework:
a working messaging system, event handling and state tracking.
These provided a great basis
despite of the fact that the core codebase of the framework was almost
completely rewritten due to an increased focus on code quality,
extensibility and API stability required by new features.
It is interesting to note, that when I've mentioned that the project requirements
were kept general on purpose (\ref{requirements}) I had good reason to do so.
When taking a look at the requirements of Bálint's Thesis, much of that
is completely obsolete by now.
But since the project has followed Agile Methodology%
\footnote{Manifesto for Agile Software Development:
\href{https://agilemanifesto.org}{https://agilemanifesto.org}}
from the start, we were able to adapt to these changes without losing
the progess he made in said Thesis. Quoting from the Agile Manifesto:
``Responding to change over following a plan''.
This is a really important takeaway.
After becoming a full time employee at Avatao I was tasked with developing the project
with Bálint, who was later reassigned to work on the GDPR compliance of the platform.

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -10,7 +10,8 @@
sectsty,
xcolor,
microtype,
tabto
tabto,
amsthm
}
\RequirePackage[bottom,hang,flushmargin]{footmisc}
@ -18,6 +19,8 @@
\sethlcolor{andigray}
\newcommand{\code}[1]{\hl{\mbox{#1}}}
\newtheorem*{note}{Note}
\newcommand{\pic}[3][width=\textwidth]
{
\begin{figure}[H]

View File

@ -41,7 +41,9 @@
\include{content/declaration}
\include{content/abstract}
\include{content/introduction}
\include{content/architecture}
\listoffigures
\lstlistoflistings
\renewcommand\bibname{References}