Mostly finish writing thesis

This commit is contained in:
Kristóf Tóth 2018-12-03 16:38:22 +01:00
parent 2206844a06
commit 766198cfd1
7 changed files with 290 additions and 115 deletions

View File

@ -137,3 +137,20 @@
year={2018},
month=sep,
}
@online{CyberArmsRace,
title={Inside the secret digital arms race: Facing the threat of a global cyberwar},
url={https://www.techrepublic.com/article/inside-the-secret-digital-arms-race/},
language={english},
author={Steve Ranger},
year={2014},
}
@online{MarriottBreach,
title={Marriott breach leaves 500 million exposed with passport, card numbers stolen},
url={https://arstechnica.com/information-technology/2018/11/marriott-breach-leaves-500-million-exposed-with-passport-card-numbers-stolen/},
language={english},
author={Megan Geuss},
year={2018},
month=nov,
}

View File

@ -77,16 +77,28 @@ than showcasing actual, real-life API messages.
\section{Messages Component}
The framework must allow content creators to communicate their \emph{message} to the user.
In other words, some way must be provided to ``talk'' to users.
The framework must allow content creators to communicate with the user,
and provide some mechanism to enable ``talking'' to them.
This is the responsibility of the \emph{messages} component, which
provides a chatbox-like element on the frontend.
The simplest form of communication it accomodates it the insertion of text
into the chatbox through API messages.
Every message has an optional \emph{originator}, which serves to remind the user
of the purpose of the given message.
This component always expects messages it receives to be in Markdown%
\footnote{\href{https://daringfireball.net/projects/markdown/}
{https://daringfireball.net/projects/markdown/}} format,
so that it is possible to nicely format any text that one might want to display.
This is especially important when displaying inline code with text around it,
so that it is easier to read for the user.
Every message has an optional \emph{originator} field, which serves
to remind the user of the purpose of the given message.
These messages are also timestamped so that it is easier to navigate through them
and look back older messages from the user.
If no timestamp is present in the API message, then it will be added on
the frontend.
This is useful, because this will use system time on the user's machine,
and as such time zones will not be an issue (whereas if we suggested adding
timestamps to the messages on the backend, content creators would have to
deal with conversions between time zones).
\pic[width=.5\textwidth]{figures/chatbot.png}{The avataobot typing in the messages component}
@ -111,6 +123,8 @@ message lengths:
The value 5 comes from the fact that on average english words are 5
characters long according to some studies.
This value could be made configurable in the future, but currently there
are no plans to make non-english challenges.
\section{IDE Component}\label{idecomponent}
@ -168,6 +182,13 @@ the selected file.
The code making this feature possible is reused several times in the framework
for interesting purposes such as monitoring the logs of processes.
It is also worth to mention that integrating such file monitoring into the framework
is not quite as simple as described above, because one has to deal with many issues like the
semi-undeterministic nature of how a single file modification can sometimes result in
several inotify events, or implement rate limiting for the whole thing to avoid
saturating the messaging system with file content updates triggered by said events
being triggered too frequently.
The editor also allows content creators to completely control it using API messages.
This involves the selecting, reading and writing of files as well as changing the
selected directory.
@ -175,6 +196,14 @@ These features allow content creators to ``guide'' a user through code bases
for example, where in each step of a tutorial a file is opened and explained
through messages sent to the chatbox of the messages component.
Developers have to \emph{explicitly} allow directories one by one to be listed by the
editor. This is done to avoid access control issues in case the editor is
running with more permissions than the user should have%
\footnote{Actually this involves extra caution, such as dealing with
symlinks in an allowed directory which could point to other, non-allowed locations}.
It is also possible to blacklist file patterns (so that binary files can be
excluded for example, as a text editor is not suitable to deal with these).
\section{Terminal Component}
This is a full-fledged xterm terminal emulator running right in the user's browser.
@ -191,23 +220,27 @@ This small server is responsible for spawning bash sessions and
unix pseudoterminals (or \code{pty}s) in the \code{solvable} Docker
container.
It is also responsible for connecting the master end of the \code{pty} to the
emulator running in your browser and the slave end to the bash session it has
emulator running in the browser and the slave end to the bash session it has
spawned.
This way users are able to work in the shell displayed on the frontend just like
they would on their home machines, which allows for great tutorials
explaining topics that involve the usage of a shell.
Note that this allows coverion an extremely wide variety of topics using TFW: from compiling
shared libraries for development, using cryptographic FUSE filesystems
Note that this allows us to cover an extremely wide variety of topics using TFW:
from compiling shared libraries for development, to using cryptographic FUSE filesystems
for enhanced privacy%
\footnote{\href{https://github.com/rfjakob/gocryptfs}{https://github.com/rfjakob/gocryptfs}},
to automating cloud infrastructure using Ansible%
or automating cloud infrastructure using Ansible%
\footnote{\href{https://www.ansible.com}{https://www.ansible.com}}.
This component exposes several functions through TFW message APIs, such as injecting commands
This component exposes several functions through TFW API messages, such as injecting commands
to the terminal, reading command history and registering callbacks that are invoked when
certain command are executed by the user.
the user executes anything in the terminal.
This allows content developers to implement functionality such as advancing the
tutorial when a certain command was invoked, or detect common mistakes in using certain
tools, such as warning users when they try to use an outdated cipher when
encrypting a file using \code{openssl}, or if they generate an RSA key with a small key size.
\pic{figures/terminal.png}{The Frontend Terminal of TFW Running top}
\pic{figures/terminal.png}{The frontend terminal of TFW running top}
The implementation of reading command history is quite an exotic one.
The framework needs to be able to detect if the user has executed any command in the
@ -216,19 +249,23 @@ This is not an easy thing to accomplish without relying on some sort of heavywei
monitoring solution such as Sysdig%
\footnote{\href{https://sysdig.comq}{https://sysdig.com}}.
I deemed most simiar systems a huge overkill to implement this functionality, and their
memory footprints are not something we could afford here.
memory footprints are not something we could afford here%
\footnote{These containers will be spawned on a per-user basis, so we must be as
conservative with memory as possible}.
Another way would be to use \code{pam_tty_audit.so} in the PAM%
\footnote{Linux Pluggable Authentication Modules:
\href{http://man7.org/linux/man-pages/man3/pam.3.html}
{http://man7.org/linux/man-pages/man3/pam.3.html}}
configurations responsible for logins, as this allows for various TTY auditing functions,
but I have found an ever simpler approach to the problem in the end.
By using the inotify system built into TFW, I can set up the user's environment in
such a way, that I can enforce and determine the location of the bash \code{HISTFILE}%
but I have found an even simpler approach to solving this problem in the end.
It is possible to set up the user's environment in
such a way during the build of the image, that I can enforce and determine the
location of the bash \code{HISTFILE}%
\footnote{This environment variable contains the path to the file bash writes command
history to}
of the user.
This way I can monitor changes made to this file and read the commands executed
By combining this with the inotify system built into TFW,
the framework can monitor changes made to this file and read the commands executed
by the user from it.
It is important to keep in mind that the user is able to ``sabotage'' this method%
\footnote{By unsetting the \code{HISTFILE} envvar for example},
@ -236,6 +273,18 @@ but that should not be an issue as this is not a feature that is intended to be
used in competitive environments (and if the users of a tutorial intentionally
break the system under themselves, well, good for them).
An other advantage of this method is that this can be applied to any interactive
application that supports logging commands executed in them in some way or an other.
A good example would be GDB%
\footnote{\href{https://www.gnu.org/software/gdb/}{https://www.gnu.org/software/gdb/}},
which supports an option called \code{set trace-commands on}. This option flushes
command history to a file after every executed command.
This feature can be combined with the file monitoring capabilities of the framework, and now
we can even detect commands executed inside GDB by the user.
This is a good example of the flexibility provided by this solution. Feature requests
like ``I'd like to create a tutorial about <insert software here>'' are quite common, and
supporting them is really easy using this extensible system.
\section{Console Component}
This component is a simple textbox that can be used to display anything to the user,
@ -245,40 +294,55 @@ API through TFW messages to write and read it's contents.
It works great when combined with the process management capabilities of the framework:
if configured to do so it can display the output of processes like webservers in real time.
When using this next to the frontend editor of the framework, it allows for a development
When using this next to the TFW frontend editor, it allows for a development
experience similar to working in an IDE on your laptop.
\pic{figures/console_and_editor.png}{The Console Displaying Live Process Logs Next to the TFW Code Editor}
Similarly to other developer tools, I chose to display this component inside the terminal
window, so that the user can switch between the two in order to conserve space using tabs.
It is also possible to configure which one should be displayed by default,
as well as switching between them mid-tutorial using API messages.
\pic{figures/console_and_editor.png}{The console displaying live process logs next to the TFW editor}
\section{Process Management}\label{processmanagement}
The framework includes an event handler capable of managing processes running inside
the \code{solvable} Docker container.
It's capabilities include starting, stopping and restarting processes.
It is also capable of emitting the standard out or standard error logs of processes
(by broadcasting TFW messages).
This component can be iteracted with using TFW API messages.
The capabilities of this componenet include the starting, stopping and restarting of processes,
as well as emitting the standard out or standard error logs belonging to them, even
in real-time (by broadcasting TFW messages).
This logging feature allows for interesting possiblities such as the handling
of live process output, or just requesting the logs belonging to a certain application when
some sort of event has occurred (such as on errors).
This component also can be interacted with using TFW API messages.
The ``Deploy'' button on the code editor uses this component to restart
processes.
processes, and the console component also uses this event handler to display
real-time logs.
The Tutorial Framework uses supervisor%
\footnote{\href{http://supervisord.org}{http://supervisord.org}}
to run multiple processes inside a Docker container
(whereas usually Docker containers run a single process only).
This is going to be explained further in a later chapter.
All this is possible through using the xmlrpc%
\footnote{\href{https://docs.python.org/3/library/xmlrpc.html}{https://docs.python.org/3/library/xmlrpc.html}}
API exposed by supervisor, which allows the framework to iteract with processes it controls.
This is going to be explained further in Chapter~\ref{usingtfw}.
It is also possible to find out what files does a process write logs to.
Combining this with the inotify capabilities of TFW explained
briefly in~\ref{idecomponent}, it becomes possible to implement live log monitoring
in the framework.
The features involving the use of inotify were among the most difficult ones implement,
since the sheer number of impossible to debug issues that such
since the sheer number of almost impossible to debug issues that such
a complex system could come with.
I'll briefly explain such a bug, which I've found to be immersely exciting.
I'll briefly explain such a bug, which I've found to be immensely exciting.
To understand this, it is necessary to signify, that the inotify API supplied by
the Linux kernel is not capable
of monitoring a single file, it is only able to watch whole directories.
I was unaware of this fact before running into this issue, as the Python
bindings I was using did not warn me about supplying a filename on top of
the directory path, and just stripped down the filename silently%
\footnote{In software development it is considered a bad practice to do such things
implicitly. It is better to fail loud and clear instead of trying to figure out
what the user meant to do.}.
During the initial development of this feature all processes inside the
\code{solvable} Docker container were writing their logs to files
in the FHS%
@ -291,21 +355,22 @@ This had caused an infinite recursion: when a process would write to \code{/tmp/
inotify would invoke a process that would also log to that location causing the kernel to
emit more inotify events, which in turn would cause more and more new proesses to spawn
and write to \code{/tmp/}, causing the whole procedure to repeat again and again.
This continued until my machine would start to run out of memory and stat swapping
This continued until my machine would start to run out of memory and begin swapping
pages to disk%
\footnote{When a modern operating system runs out of physical RAM, it is going to swap
virtual memory pages to disk so it can continue to operate --- slowly}
like crazy, causing the whole system to spiral downwards
in a spectacular fashion until the whole thing managed to crash.
It was an event of such chaotic beauty, that I often fondly recall it to this day.
It was an event of such rare and chaotic beauty, that I often fondly recall it to this day.
After my first encounter with the bug I decided to have lunch instead.
Of course it would take me several hours to identify the exact causes behind this
fascinating phenomenon, but those were \emph{very} fun hours.
fascinating phenomenon, but those were \emph{very} fun hours at least.
\section{FSM Management}
I have already mentioned the event handler called \code{FSMManagingEventHandler},
which is responsible for managing the framework FSM.
For completeness I chose to include it on this chapter as well.
For completeness I chose to include it in this chapter as well.
The API it exposes through TFW messages allows client code to attempt stepping the
state machine.
As previously explained this is something that is considered to be a \emph{privileged}
@ -342,26 +407,27 @@ thing work with the Same Origin Policy%
{https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin\_policy}}
being in effect?
The answer is that developers must use a \emph{relative url}, that is an URL relative
to the entry pont of the TFW frontend itself.
to the entry point of the TFW frontend itself.
To allow serving several web applications from a single port the framework
supports optional reverse-proxy configurations through the nginx%
\footnote{\href{http://nginx.org}{http://nginx.org}} web server ran by the framework.
More on this in a later chapter.
More on this in a Chapter~\ref{usingtfw}.
\section{Various Frontend Features}
The angular frontend of features several different layouts.
The Angular frontend of the framework features several different layouts.
These layouts are useful to accomodate different workflows for users,
such as the previous exampe of editig code and being able to view the
result of said code in real time next to the editor.
Another example would be editing Ansible playbooks in the file editor,
and then trying to run them in the terminal.
There are also almost full screen views for each component that makes sense
to be used that way.
to be used like that, for instance the code editor can be used to conveniently
edit larger files this way.
The frontend was designed in a way to be fully responsive in windows sizes
The frontend was designed in a way to be fully responsive in window sizes
that still keep the whole thing usable (i.e.\ it would not be practial to start
solving TFW tutorials on a smart phone, simply because of size limits, so they are
solving TFW tutorials on a smart phone, simply because of size limits, so such small screens are
not supported, but the frontend still behaves as expected on small laptops or bigger tablets).
This is not an easy thing to impelent and maintain due to the lots of small
incompatibilites between browsers given the complexity of the frontend.
@ -381,15 +447,34 @@ The framework frontend is built on grid layout and flexboxes%
{https://developer.mozilla.org/en-US/docs/Web/CSS/CSS\_Grid\_Layout}},
which gives us the best hopes of being able to maintain it down the line.
It would involve unimaginable horrors to support this multi-layout
frontend on older browsers, so browsers without flex and grid
support are not supported by TFW.
frontend on older browsers without flexboxes and grid, so I have
decided to avoid spending development time on these and make sure that
it works like a charm in reasonably modern browsers.
Arguably this is a good thing, as people should keep their browsers up to date to
follow frequent security patches anyway, so let this serve as a reminder to
developers looking to get into IT security that the first step is to
keep your software up to date.
The frontend of the framework exposes some additional APIs.
These include the changing of layouts, selecting the terminal or console
\pic{figures/tfw_grid.png}{The grid layout of the TFW frontend showcased from developer tools}
There are several additional APIs exposed by the frontend,
which include the changing of layouts, selecting the terminal or console
component to be displayed, the possibility of dynamically modifying
frontend configuration values (such as the frequency of autosaving the files in the editor)
and more.
To accomodate communication with the TFW server, the frontend of the framework
comes with some library code which can be used to send and receive TFW messages.
This code is mostly WebSockets combined with RxJS%
\footnote{\href{https://rxjs-dev.firebaseapp.com}{https://rxjs-dev.firebaseapp.com}},
which makes it easier to write completely asynchronous, callback based code.
The observables%
\footnote{\href{http://reactivex.io/documentation/observable.html}
{http://reactivex.io/documentation/observable.html}}
provided by RxJS are used all around the TFW frontend, as our library code
exposes the operation of receiving data from WebSockets as observables.
Client code can subscribe to these observables using callbacks,
which will be invoked when the observable emits a new value
(i.e.\ when a new message was received on the WebSocket).
When using Angular it is generally a good idea to get familiar with reactive programming,
because it is very easy to get lost in the callback hell without it.

View File

@ -6,25 +6,29 @@ In this chapter I would like to express my gratitude towards great people who ha
helped me in some way or an other along the way.
First of all I would like to thank Bálint Bokros, my good friend and colleauge for
his awesome work during the initial phases of development and for always
his awesome work regarding TFW and for always
being open to provide useful input.
He has also earned my gratitude by always being there to lift my spirits, be that
with beer or general friendship things.
with beer or friendship.
I'd also like to thank Gábor Pék and Márk Félegyházi,
for letting us make this project possible by always setting reasonable goals,
being there to provide feedback and encouraging us along the way.
Gábor Pék also contributed to this in several ways, for which we always will be
grateful.
Gábor has also contributed even more directly, for which I always will be
grateful: it was mostly due to his visions that we were able to
dream this big, something that I've found extremely valuable when looking back.
I can't thank my consultant, Levente Buttyán enough for enduring my general
inability to deal with deadlines and administration.
I also appreciate the company my colleagues have provided in the office,
by always being there be that for work or fun. They also have my gratitude
for general contributions to the framework, be that with ideas, assistance,
I also appreciate the great morale my colleagues and friends provided in the office,
by always being there, be that for work or fun. This project couldn't have been realised
sitting in a depressing cube between 200 people hating their jobs. They also have my gratitude
for direct contributions to the framework, be that with ideas, assistance,
or actual code.
Finally I'd like to thank all the developers for using the framework and creating
great tutorials with it. They always provide useful feedback, bug reports, and
great tutorials with it. At the end of the day it feels awesome to know that
my work helps other people, and it is content developers who make this possible.
They always provide useful feedback, bug reports, and
have great feature requests or even contributions.

View File

@ -8,8 +8,10 @@ engineering is on the rise.
While we are enjoying the comfort that information technology provides us, we often forget
about the risks involved in relying so much on software in our everyday lives.
When taking a look on recent events, such as a cyber arms race taking place between leading
powers, 50 million Facebook accounts being breached
powers\cite{CyberArmsRace}, 50 million Facebook accounts being breached
due to the incorrect handling of access tokens\cite{FacebookBreach},
the very recent Marriott hack where sensitive data on 500 million customers
was stolen\cite{MarriottBreach},
or how China is building an Orwellian state of total digital surveillance%
\cite{ChinaSurv}\cite{ChinaCredit},
it becomes clear that security and privacy in the IT sector

View File

@ -3,38 +3,39 @@
In this chapter I am going to evaluate the state of the project and set future goals for
the framework.
I'll also try and reflect on some of the most important things I have learned during
working on this it, in case I've experienced something that might be useful for
working on it, in case I've experienced something that might be useful for
someone else reading this in the future.
\section{Project Evaluation}
How do we define if a project is a success or not?
Instead of trying to do so, in this section I am going to express
Instead of attempting to do so, in this section I am going to express
my personal feelings and opinions about the Tutorial Framework.
To get unbiased opinions I'd recommend asking someone who hasn't been maintaining
this project for so long.
I could promise to be as objective as possible, but I think that it is just better
to admit that I have a sweet spot for this project.
I could promise to be as objective as possible, but I think that it is just better to
be honest and admit that I have a sweet spot for this project.
Currently a total of 63 tutorials based on the framework are running in production,
with new ones being released on a weekly basis.
These exercises have been solved several hunders of times.
User feedback is getting better and better as the project moves forward.
As a maintainer currently I know about a single unfixed bug in the framework, which
As a maintainer, currently I know about a single unfixed bug in the framework, which
is getting reported by users as well.
There are more, of course, the world is never going to run out of bugs to fix,
but I sleep well knowing that things aren't breaking on a constant basis.
but at least I sleep well knowing that things aren't breaking on a constant basis.
Considering that this is a one year old project including initial development,
I'd consider this a solid success.
We were able to achieve most of the goals we have envisioned on the beginning of
this journey, and considering some of the things we have planned for the future
We were able to achieve most --- if not all --- of the goals we have envisioned on the
beginning of this journey, and considering some of the things we have planned for the future
we are just getting started.
\section{I Have a Plan!}
In this section I'd like to set some goals regarding the future of the framework
apart from implementing new features (new features will always come in as we go).
apart from implementing new features, as these will always keep coming in, and we
have some great ones planned, that I can promise.
First of all I think that we need to put more focus on developing TFW, as currently
other projects are often being priorized over it.
@ -52,18 +53,20 @@ To make this better I'd need to consider planning ahead more, so that the newest
enough to support new features on the roadmap and not get distracted as much by
other features emerging on the horizon.
An other thing is that I often feel like that there are some things in using the framework
An other thing is that I often feel like that there are some things in using TFW
that could be made a lot easier. As a maintainer sometimes I find it hard to
tell what these things are, as I know TFW inside out, having written most
tell what these things exactly are, as I know the framework inside out, having written most
of the codebase myself.
I'd like to set some time aside to create tutorials using the framework myself
I'd like to set some time aside to create tutorials using the framework myself,
so I can better narrow these potential difficulities down.
This would require me to be able to take things slow for a few weeks, as this is not
something that is possible to do effectively in a rush. In the summer months, maybe?
Currently the framework is proprietary software.
While it is not feasible to go open source today or tomorrow for various reasons,
we all believe that software which is free as in freedom \emph{is} the future.
As such, at some point I'd like to open source the whole thing if the circumstances will allow
us to do so.
the company to do so.
\section{Things That I Have Learned}
@ -78,6 +81,14 @@ as I just simply enjoy admiring quality typography which WYSIWYG%
I've spent a long time working on and maintaining the Tutorial Framework.
While the list of technical things I've learned is long and exciting, I also feel like
I've learned a lot about supporting other developers, project management and communication.
An other thing that I've been able to learn is to adopt a more patient mindset while
working. Back in the day I used to be nervous because of deadlines and things not
working how they were supposed to, but now I know that these things are a part
of the job and one must be able to deal with them without getting agitated.
Any time I feel like something is not OK, I just try take a step back, relax a bit to
blow of steam and approach the issue without acting in haste.
I think this is not too related to working as a software engineer, but something
that can be applied to anything we do.
The most important thing, that I will always remember as a software engineer
and is something that I've learned during this period

View File

@ -11,10 +11,11 @@ The main points include:
\item Defining an FSM to describe the flow of the tutorial and implementing proper callbacks
for this machine, such as ones that display messages to the user
\item Implementing the required event handlers, which may trigger state transitions in the FSM,
interact with non-TFW code and do various things that might be needed during an exercise
interact with non-TFW code and do various things that might be needed during an exercise,
such as compiling code written by the user or runnign unit tests
\item Defining what processes should run inside the container besides the things TFW
starts automatically
\item Setting up reverse proxying for any user-facing network applications such as webservers
\item Setting up reverse proxying for any user-facing network application such as webservers
\end{itemize}
At first all these tasks can seem quite overwhelming.
Remember that \emph{witchcraft} is what we practice here after all.
@ -53,14 +54,18 @@ understanding of how the framework interacts with client code.
|--...
\end{lstlisting}
\subsection{Avatao Configuration File}
\subsection{Avatao Specific Files}
The \code{config.yml} file is an Avatao challenge configuration file,
which is used describe what kind of Docker containers implement a challenge,
what ports do they expose talking what protocols, define the name of the
excercise, it's difficulity, and so on.
Every Avatao challenge must provide such a file.
The Tutorial Framework does not use this file, this is only required to run
the exercise in production, so it is mostly out of scope for this thesis.
An other thing that is not even indicated on the structure above is the \code{metadata}
directory, which contains the short and long descriptions of challenges in
Markdown format.
The Tutorial Framework does not use these files in any way whatsoever,
these are only required to make the tutorial function on the Avatao platform.
\subsection{Controller Image}
It was previously mentioned that the \code{controller} Docker image is responsible
@ -68,16 +73,16 @@ for the solution checking of challenges (whether the user has completed the exer
Currently this image is maintained in the test-tutorial-framework repository.
It is a really simple Python server which functions as a TFW event handler as well.
It subscribes to the FSM update messages
broadcasted by the \code{FSMManagingEventHandler}, we've previously discussed,
broadcasted by the \code{FSMManagingEventHandler}, we have discussed previously,
this way it is capable of keeping track of the state of the tutorial,
which allows it to detect if the final state of the FSM is reached.
\subsection{Solvable Image}
Currently the Tutorial Framework is maintained in three git repositories:
\begin{description}
\item[baseimage-tutorial-framework] Docker baseimage (contains all backend logic)
\item[frontend-tutorial-framework] Angular frontend
\item[test-tutorial-framework] An example tutorial built using baseimage and frontend
\item[baseimage-tutorial-framework:] Docker baseimage (contains all backend logic)
\item[frontend-tutorial-framework:] Angular frontend
\item[test-tutorial-framework:] An example tutorial built using baseimage and frontend
\end{description}
Every tutorial based on the framework must use the TFW baseimage as the parent of
it's own \code{solvable} image, using the \code{FROM}%
@ -104,8 +109,11 @@ I am going to discuss these one by one.
\subsection{Dockerfile}
Since this is a Docker image it must define a \code{Dockerfile}.
This image always uses the baseimage of the framework as a parent image.
Besides this developers can use this as a regular \code{Dockerfile} to work with as
they see fit to implement their tutorial.
Besides this developers can use this as a regular \code{Dockerfile} to work with
in any way they see fit to implement their tutorial.
This means that developers looking to create content on Avatao, be that
with the Tutorial Framework or without it must be familiar with Docker,
as they will have to set everything up to work inside a container.
\subsection{Frontend}
This directory is designed to contain a clone of the frontend repository.
@ -116,19 +124,29 @@ setup of the development environment.
As previously mentioned, the framework uses supervisor to run several processes
inside a Docker container.
Usually Docker containers only run a single process and developers simply start
more containers instead of processes if required.
more containers instead of processes if required (and use tools such as docker-compose%
\footnote{\href{https://docs.docker.com/compose/}{https://docs.docker.com/compose/}}
or kubernetes%
\footnote{\href{https://kubernetes.io}{https://kubernetes.io}}
to orchestrate their containers).
This approach is not suitable for TFW, as it would require the framework to orchestrate
Docker containers from an other container, which is feasible in theory but
very hard and impractial to do in practice.
Docker containers from inside a container managed by the same Docker daemon, which is
feasible in theory but very hard and unservicable to do in practice.
This would require doing something like mounting the unix domain socket used
to manage the Docker daemon inside a running container managed by that daemon,
which is a fun thing to
play around with in my free time but not something suitable for running in production,
not even mentioning the severe security implications of doing something like that.
Supervisor is a process control system designed to be able to work with
processes on UNIX-like operating systems.
When a tutorial built on TFW is started, the framework starts supervisor with
When a tutorial built on TFW is started, a Docker container starts with supervisor running as
PID\footnote{Process ID, on UNIX-like systems the \code{init} program is the first
process started} 1, which in turn starts all the programs defined
in this directory using supervisor configuration files.
For example, a developer would use a file similar to this to run a webserver
written in python:
process started, and who gets PID 1 traditionally.} 1, which in turn starts all the
programs defined in the \code{solvable/supervisor} directory.
Content creators can use supervisor configuration files to define these programs.
For example, a developer would write a file similar to this one and place it into the
\code{solvable/supervisor} directory to run a webserver written in Python:
\begin{lstlisting}
[program:yourprogram]
user=user
@ -138,35 +156,51 @@ autostart=true
\end{lstlisting}
As mentioned earlier in~\ref{processmanagement}, any program that is started this way
can be managed by the framewok using API messages.
All this is possible through using the xmlrpc%
\footnote{\href{https://docs.python.org/3/library/xmlrpc.html}
{https://docs.python.org/3/library/xmlrpc.html}}
API exposed by supervisor, which allows the framework to interact with it to control processes.
This API is quite flexible and can be used to achieve a number of things which would be
clumsy to do without using it (i.e.\ supervisor has a command line utility called
\code{supervisorctl} that exposes similar functionality to the xmlrpc bindings,
but it is better to communicate with the supervisor daemon directly than to
invoke it's command line utility in a separate process when you need something done).
\subsection{Nginx}
For simplicity, exercises based on the framework only expose a single port from the
\code{solvable} container.
This port is required to serve the frontend of the framework.
If this is the case, how do we run additional web applications to showcase vulnerabilies
on during the tutorial?
on during a tutorial?
Since one port can only be bound by one process at a time, we will need to
use a reverse-proxy to to bind the port and redirect traffict to other
webservers binding non-exposed ports.
run a reverse-proxy%
\footnote{\href{https://www.nginx.com/resources/glossary/reverse-proxy-server/}
{https://www.nginx.com/resources/glossary/reverse-proxy-server/}} server inside the
container to
bind the exposed port and redirect traffic to other webservers binding non-exposed ports.
To support this, TFW automatically runs an nginx webserver (it uses this nginx
process to serve the framework frontend as well) we can supply additional configurations to.
Any configuration files placed into this directory will be interpreted by nginx
once the container has started.
To support this, TFW automatically starts an nginx webserver. It uses this nginx
instance to serve the framework frontend as well.
It is possible to supply additional configurations to this server in a convenient manner:
any configuration files placed into the \code{solvable/nginx} directory will be
interpreted by nginx once the container has started.
To set up the reverse-proxying of a webserver running on port 3333,
one would write a config file similar to this one:
one would write a configuration file similar to this one:
\begin{lstlisting}
location /yoururl {
proxy_pass http://127.0.0.1:3333;
}
\end{lstlisting}
Now the content server by this websever will be available on ``<challenge\_url>/yoururl''.
Now the content served by this websever on port 3333
will be available on the url \code{<challenge-url>/yoururl} despite that port 3333
does not accept connections from outside the container as it is not exposed.
It is very important to understand, that developers
have to make sure that their web application \emph{behaves well} behind a reverse proxy.
What this means is that they are going to be serverd from a ``subdirectory'' of an URL:
for example ``/register'' will be served under ``/yoururl/register''.
What this means is that they are going to be served from a ``subdirectory'' of the top
level URL\@:
for example \code{/register} will be served under \code{/yoururl/register}.
This means that all links in the final HTML must refer to the proxied urls, e.g.\
``/yoururl/register'' and server side redirects must point to the correct hrefs as well.
\code{/yoururl/login}, and server side redirects must point to these correct hrefs as well.
Idiomatically this is usually implemented by supplying a \code{BASEURL}
to the application through an environment variable, so that it is able to set
itself up correctly.
@ -181,18 +215,18 @@ Normally when one uses the \code{COPY}%
command to create a layer%
\footnote{\href{https://docs.docker.com/storage/storagedriver/}
{https://docs.docker.com/storage/storagedriver/}} in a Docker image,
this action takes place when building that image (i.e.\ in the \emph{build context}
this action takes place on building that image (i.e.\ in the \emph{build context}
of that image).
This is not good for this use case: when building the framework baseimage,
these configuration files that will be written by content developers do not even
exist.
these configuration files that will be written by content developers using TFW in
the future do not even exist yet.
How could we copy files into an image layer that will be created in the future?
It is possible to use a command called \code{ONBUILD}%
\footnote{\href{https://docs.docker.com/engine/reference/builder/\#onbuild}
{https://docs.docker.com/engine/reference/builder/\#onbuild}}
in the Dockerfile of a baseimage to delay another command
to the point in time when other images will use the baseimage
to the point in time where other images will use the baseimage
as a parent with the \code{FROM} command. This makes it possible to execute
commands in the build context of the descendant image.
This is great, because the config files we need \emph{will} exist in the build
@ -202,6 +236,12 @@ In practice this looks something like this in the baseimage \code{Dockerfile}:
ONBUILD COPY ${BUILD_CONTEXT}/nginx/ ${TFW_NGINX_COMPONENTS}
ONBUILD COPY ${BUILD_CONTEXT}/supervisor/ ${TFW_SUPERVISORD_COMPONENTS}
\end{lstlisting}
It is important to keep in mind however, that the layers created by these
\code{ONBUILD} commands will only be available \emph{after} the \code{FROM}
command is executed when building the child image \emph{in the future}.
This means that if you want to
do something with these files in the baseimage build after they have
been copied, those things must be done in \code{ONBUILD} commands as well.
\subsection{Source Directory}
The \code{src} directory usually holds tutorial-specific code, such as
@ -210,7 +250,8 @@ served by the exercise and generally anything that won't fit in the other,
framework-specific directories.
The use of this directory is not mandatory, only a good practice, as developers
are free to implement the non-TFW parts of their exercises as they see fit
(the copying of these files into image layers are their resposibility).
(the copying of these files into image layers using \code{solvable/Dockerfile}
is their resposibility as well).
\section{Configuring Built-in Components}
@ -257,6 +298,21 @@ YAML based state machine implementations also allow the usage of the Jinja2%
templating language to substitute variables into the YAML file.
These substitutions are really powerful, as one could even iterate through arrays
or invoke functions that produce strings to be inserted using this method.
This is very similar to how Ansible uses%
\footnote{\href{https://docs.ansible.com/ansible/2.6/user_guide/playbooks_templating.html}
{https://docs.ansible.com/ansible/2.6/user\_guide/playbooks\_templating.html}}
Jinja2, and I was certainly inspired by this
when coming up with this idea.
For example, if we had an FSM with five states, we could use the following
Jinja2 code to generate a transition called \code{step_next} between each state
in a \code{for} cycle:
\begin{lstlisting}
{% for i in range(5) %}
- trigger: 'step_next'
source: '{{i}}'
dest: '{{i+1}}'
{% endfor %}
\end{lstlisting}
\subsection{Python based FSM}
Optionally, the same state machine can be implemented like this in Python using
@ -279,7 +335,7 @@ Python as well, it is going to be easier to interface with library code.
In this section I am going to showcase how implementing event handlers is possible
when using the framework.
I am going to use the Python programming language, but it isn't hard
to create event handlers in other languages, because the only thing
to create event handlers in other languages, as the only thing
they have to be capable of is communicating with the TFW server using
ZeroMQ sockets, as previously discussed.
The library provided by the framework abstracts low level socket logic
@ -305,12 +361,11 @@ abstract method, which is used to, well, handle events.
To make getting started as smooth as possible I have created
a ``bootstrap'' script which is capable of creating a development envrionment from
scratch.
This script is distributed as the following bash one-liner:
\begin{lstlisting}[language=bash]
bash -c "$(curl -fsSL https://git.io/vxBfj)"
\end{lstlisting}
This command downloads a script using \code{curl}%
This command downloads the script using \code{curl}%
\footnote{\href{https://curl.haxx.se}{https://curl.haxx.se}}, then executes it in bash.
In the open source community it is quite common to distribute installers this way%
\footnote{A good example of this is oh-my-zsh
@ -340,7 +395,7 @@ then pipes it into a bash interpreter \emph{only if} the checksum
of the downloaded string matches the one provided, otherwise it displays
an error message.
Software projects distributing their product as binary installers often
display such checksums on their download pages with the purpose to potentially
display such checksums on their download pages with the purpose of potentially
mitigating MITM attacks.
The bootstrap script clones the three TFW repositories and does several steps
@ -348,7 +403,7 @@ to create a working environment into a single directory, that is based on
test-tutorail-framework:
\begin{itemize}
\item It builds the newest version of the TFW baseimage locally
\item It pins the version tag in \code{solvable/Dockerfile},
\item It pins the version tag of this image in \code{solvable/Dockerfile},
so that this newly-built version will be used by the tutorial
\item It places the latest frontend in \code{solvable/frontend} with
depencendies installed
@ -376,15 +431,15 @@ Why is this the case?
\subsection{The Frontend Issue}
To be able to understand this, we will have to gain some understanding of the
build process of Angular projects.
To be able to understand this, we will have to gain some understanding on how the
build process of Angular projects work.
When frontend developers work on Angular projects, they usually use the built-in
developer tool of the Angular-CLI%
\footnote{\href{https://cli.angular.io}{https://cli.angular.io}},
\code{ng serve} to build and serve their application.
\code{ng serve} to build and serve their applications.
The advantage of this tool is that it automatically reloads the frontend
when the code on disk is changed, and that it is generally very easy to work with.
when the code on the disk is changed, and that it is generally very easy to work with.
On the other hand, a disadvantage is that a \code{node_modules} directory
containing all the npm%
\footnote{\href{https://www.npmjs.com}{https://www.npmjs.com}}
@ -405,13 +460,14 @@ This is why today frontend builds usually take a lot longer than building anythi
not involving JavaScript (such as C++, C\# or any other compiled programming language).
This mess presents it's own challenges for the Tutorial Framework as well.
Since hundreds of megabytes of dependencies have no place inside Docker containers%
\footnote{Otherwise it may take tens of seconds just to send the build context to
Since hundreds of megabytes of npm dependencies have no place inside Docker images%
\footnote{Or it may take tens of seconds just to send the build context to
the Docker daemon, which means waiting even before the build began},
by default the framework will only place the results of a frontend production build
by default the framework will only copy the results of a frontend production build
of \code{solvable/frontend} into the image layers.
This slows down the build time of TFW based challenges so much, that instead of like
30 seconds, they will often take 5 to 10 minutes.
30 seconds, they could often take 5 to 10 minutes depending on what hardware
you use.
\subsection{The Solution Offered by the Framework}
@ -426,7 +482,7 @@ while the \code{tfw.sh} script is capable of starting a development server
to serve the frontend locally using \code{ng serve} besides starting
the Docker container without the frontend.
If this whole thing wasn't complicated enough, since Docker binds the port
the container is going to use, \code{tfw.sh} has to run this dev server on
the container is going to use, \code{tfw.sh} has to run the Angular dev server on
an other port, then use the proxying features of Angular-CLI to forward requests
from this port to the runnign Docker container when requesting resources
other then the entrypoint to the Angular application.

BIN
figures/tfw_grid.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 282 KiB