thesis/content/introduction.tex

\chapter{Introduction}

\section{Project justification}

As the world is being completely engulfed by software, the need for accessible, but
high quality learning materials covering software engineering and especially secure software
engineering is on the rise.
While we are enjoying the comfort that information technology provides us, we often forget
about the risks involved in relying so much on software in our everyday lives.
When taking a look on recent events, such as a cyber arms race taking place between leading
powers\cite{CyberArmsRace}, 50 million Facebook accounts being breached
due to the incorrect handling of access tokens\cite{FacebookBreach},
the very recent Marriott hack where sensitive data on 500 million customers
was stolen\cite{MarriottBreach},
or how China is building an Orwellian state of total digital surveillance%
\cite{ChinaSurv}\cite{ChinaCredit},
it becomes clear that security and privacy in the IT sector
is more important now than ever.

With all of our data slowly crawling towards the cloud and an IoT revolution on our necks,
we as an industry must face the music and start actually doing something before we enter
a new age of digital wild west, which could involve us running around in vulnerable self
driving cars\cite{SelfDriving} with power over life and death, while exposing all our
sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}.
What a time to be alive.
It is important to express that IT security is something that is \emph{really hard} to
get right.
Even if right often only means better then your neighbour, as perfect security is an utopia
that doesn't seem to exist\cite{NoPerfectSecurity}.
Often when large and reputable companies in the industry such as
CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times
is when people start to grasp how difficult it actually is.
This is why unless we want to disconnect all our devices from all networks and ban USB
sticks, the best lines of defense are going to be people --- a new generation
of \emph{security conscious} users and developers.

Among many other things outside IT, this is only possible with education\cite{ITSecEdu}.
We need to come up with engaging, addictive and fun ways to learn (and teach), so that
more and more people will be motivated to do so and the drive to acquire and share
knowledge is something that comes naturally, rather than something we have to struggle for.
I believe that this is something that \emph{can} and \emph{should} be applied to
everything we do as a society.
The only thing we can hope and work for is to become better and better as time
and generations pass by.
We \emph{must} do better, and education is the way forward.

The short term goal of this project --- and the goal of this thesis --- is to provide
a new angle in the education of software engineering, especially secure software
engineering based on the aspirations above, with the long term goal of bringing
something new to the table in the matter of IT education as a whole
(not just for developers, but for users as well).

\section{A Short Introduction to Avatao}

The goal of Avatao as a company is to help software developers in building a \emph{culture} of
security amongst themselves, with the vision that if the world is going to be taken over by
software no matter what, that software might as well be \emph{secure software}.
To achieve this goal we have been working on an online e-learning platform with hundreds%
\footnote{654 exercises as of today, to be exact}
of hands-on learning exercises to help students and professionals
master IT security, collaborating with
universities around the world and providing a solution for companies in building
\emph{security consciousness} amongst their developer teams.

Since starting out we have amassed some experience in building fun challenges
that showcase the exploitation and fixing of relevant security vulnerabilites in code or
configuration.
Traditionally these exercises revolved around offensive and defensive tasks, with challenges
often being split into two or more parts.
For example users would have to hack a website by exploiting a buffer overflow vulnerability,
then in the second challenge they would fix the code they've just exploited in a web based
code editor.
These kind of exercises offer great flexibility to reflect real world security issues, as in
more complex challenges users might be required to exploit multiple vulnerabilites for success,
and understand the ways they augment each other.
We often recreate real world scenarios based on incident reports released by companies for
added authenticity and relevance\cite{AkosFacebook}.
Our challenges usually involve some sort of website acting as frontend for the vulnerable
application, or require the user to connect using SSH\@.

\pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform}

The Avatao platform relies heavily on Docker containers to spawn challenges,
which makes it extremely flexible in terms of what is possible to do when creating
content.
Essentially anything that you can do inside a Docker conainer can be done on
the Avatao platform as well.
Currently each challenge is implemented as a set of Docker images residing inside a
Git repository exclusive to the specific challenge in mind.
Our content creation wokflow enables developers to create such repositories on GitHub,
which are automatically set up with the proper webhooks, so that when their content gets
reviewed (and their feature branches merged), their changes will go live on the
platform as well.
In the future we also plan on supporting the use of virtual machines to implement
challenges, which could further increase this fexibility by addig the possiblity to do
things like exercises involving the use of Docker or Windows based challenges.

\section{Emergence}\label{intro:emergence}

While working as a content creator I have stumbled into the idea of automating the completion
of challenges for QA\footnote{Quality Assurrance} and demo purposes.
I used to record short videos or GIFs to showcase my content to management.
In a certain scenario I was required to integrate a web based terminal emulator into a
frontend application to improve user experience by making it possible to use a shell
right on the website rather than having to connect through SSH\@.

After I got this working I was looking into writing hacky bash scripts to automate the steps
required to complete the challenge in order to make it easier for me to record the solution,
as I have often found myself recording over and over again for a demo without any mistakes.
During the time I was playing around with this idea, researching possible solutions have led me
to a hidden gem of a project on GitHub called \code{demo-magic}%
\footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}},
which is esentially a bash script that simulates someone typing into a terminal and executing
commands.

I have created a fork%
\footnote{
\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}
{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}}
of the project and integrated it into my challenge.
Soon after recording demo videos was not even necessary anymore, as I have started to distribute
the solution script with the challenge code itself, making it toggleable using build-time
variables.
Should the solution script be enabled, the challenge would automatically start%
\footnote{I did this by injecting the solution script into the user's \code{.bashrc} file}
completing itself in the terminal integrated into it's frontend, often even explaining the
commands executed during the solution process.

\lstinputlisting[
    language=bash,
    caption={Example for a solution script},
    captionpos=b
]{listings/demosh.example}

I was quite pleased with myself, no longer having to do the busywork of recording videos,
but what I did not know was that I have accidentally
did something far more than a hacky bash script solving challenges, as this little script
would help formulate the idea of the \emph{Tutorial Framework} or just \emph{TFW}.

\section{Vision of the Tutorial Framework}

The whole ``challenges that solve themselves'' thing seemed like an idea that has great
potential if developed further.
We have envisioned something that resembles a learning video, but it is real, actual
software running and interacting with itself to showcase different topics to the user.
Something that would allow the users to stop at any given time, take a breath, interact
with the environment on their own (i.e.\ take a look a the directory structure or a file,
try what happens if a command is executed somewhat differently, etc.) and then
continue on with the tutorial.
We wanted to create something that would feel like if an actual teacher was standing
next to you, explaining topics to you in your own pace, while showing you how to solve
a related task.
This teacher scenario would allow you to take the helm sometimes and try applying
your newfound skills in action immediately.

For example a chatbot would show you how to encrypt a file using GnuGP%
\footnote{\href{https://www.gnupg.org}{https://www.gnupg.org}},
then it would ask you to encrypt an other file similarly.
After this the bot could teach you how to a configure a database server and then
ask you to write a configuration file yourself and then encrypt it because it might
contain sensitive data such as open ports, usernames and such.

Technically this is far from trivial however: we would have to keep track of the user's
progress at all times, be able to actually check if the user has successfully encrypted
the file by decrypting it and then checking if the configuration file is valid or not
(this would practically require trying to start a database server with it).
After all this we would still have to offer \emph{relevant} and helpful assistance if
something went wrong.

Another scenario we've visioned was the following: Imagine a code editor on the
right which contains the authentication logic of a website.
On the left, imagine that the website which the code in the editor
implements is present. Note that the website is completely real: it is an actual, functional web
application users can interact with (i.e.\ navigate through the pages, register or log in).
The code editor has a button titled ``Deploy'' on it.
If the user changes the source code of the application and clicks this button, the application
should restart itself with the new code.
Let's say that the user comments out the part that authenticates a user.
In this case the application should let anyone log in dummy credentials.
Meanwhile a console could show the output of the webserver.
For example if the source code the user tried to deploy was invalid, the framework
should report the exact exception raised while running the application.

\pic{figures/webapp_and_editor.png}{The code editor and web application example in TFW}

Even if we did all this, we would still need a way to integrate this whole thing into
a web based frontend with a file editor, terminal, chat window and stuff like that.
Turns out that today all this can be done by writing a few hundred lines of Python
code which uses the Tutorial Framework.

\pic{figures/webapp_and_editor_err.png}{Invalid code and deployment failure with process output}

Note that it is possible to try out the current version of the Tutorial Framewok
using a guest account on the Avatao platform on this
\href{https://platform.avatao.com/paths/d0ccef1f-0389-45bf-9d44-e85b86d66c49/challenges/a7e08c0a-199f-4f8d-aa7e-51b6e9bfcb15}{url}%
\footnote{\href{https://platform.avatao.com/paths/d0ccef1f-0389-45bf-9d44-e85b86d66c49/challenges/a7e08c0a-199f-4f8d-aa7e-51b6e9bfcb15}
{https://platform.avatao.com/paths/d0ccef1f-0389-45bf-9d44-e85b86d66c49/challenges/a7e08c0a-199f-4f8d-aa7e-51b6e9bfcb15}}.

\subsection{Project Requirements}\label{requirements}

Based on this it is now more or less possible to define requirements for the project.
The reason for the ``more or less'' part is that all of this is pretty much bleeding edge,
where the requirements could shift dynamically with time.
For this reason I am going to be as general as possible, to the point that some of
this might even sound vauge.
To achieve our goals we would need:

\begin{itemize}
    \item a way to keep track of user progress
    \item a way to to handle various events (i.e.\ we can react when
          the user has edited a file, or has executed a command in the terminal)
    \item a highly flexible messaging system, in which processes running on the backend and
          frontend components running in a web browser could communicate with eachother
    \item a web based frontend with lots of built-in options (terminal, file editor, chat
          window, etc.) that use said messaging system
    \item stable APIs that can be exposed to content creators to work with (so that
          framework updates won't break client code)
    \item tooling for development (distributing, building and running)
\end{itemize}

\section{Early Development}

Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company
Gábor Pék and myself would start designing the TFW architecture.
In this early phase we would research solutions for the issues described such as
tracking user progress, process management, interprocess communication
and making a web based frontend application capable of communicatig with processes running
inside a Docker container.

After seeing some sort of light at the end of the tunnel regarding what technologies could
be applied and coming up with several good alternatives Bálint Bokros was tasked to
develop the first proof of concept and lay the foundations of the framework in his
Bachelor's Thesis\cite{BokaThesis}.

Although not much of the original code base has remained due to intense refactoring
and all around changes, the result would serve as a solid foundation for further development,
and the architecture is mostly the same to this day.
The resulting code would be the first working POC%
\footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection
attack.
This initial version included the foundations of the framework:
a working messaging system, event handling and state tracking.
These provided a great basis
despite of the fact that the core codebase of the framework was almost
completely rewritten due to an increased focus on code quality,
extensibility and API stability required by new features.

It is interesting to note, that when I've mentioned that the project requirements
were kept general on purpose in~\ref{requirements}, I had good reason to do so.
When taking a look at the requirements of Bálint's thesis, much of that
is completely obsolete by now.
But since the project has followed Agile Methodology%
\footnote{Manifesto for Agile Software Development:
\href{https://agilemanifesto.org}{https://agilemanifesto.org}}
from the start, we were able to adapt to these changes without losing
the progess he made in said thesis. Quoting from the Agile Manifesto:
``Responding to change over following a plan''.
This is a really important takeaway.

After becoming a full time employee at Avatao I was tasked with developing the project
with Bálint, who was later reassigned to work on the GDPR compliance of the platform.
Thus it became my job to turn the framework into a stable code base ready for
usage by content creators and to implement most of the features that we've envisioned
earlier.