432 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			432 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \chapter{Using the Framework}
 | ||
| 
 | ||
| In this section I am going to dive into further detail on how client code is supposed
 | ||
| to use the framework, some of the design decisions behind this and how everything is
 | ||
| is integrated into the \code{solvable} Docker image.
 | ||
| 
 | ||
| To use the framework one has to do several things to get started.
 | ||
| The main points include:
 | ||
| \begin{itemize}
 | ||
|     \item Setting up a development environment
 | ||
|     \item Defining an FSM to describe the flow of the tutorial and implementing proper callbacks
 | ||
|           for this machine, such as ones that display messages to the user
 | ||
|     \item Implementing the required event handlers, which may trigger state transitions in the FSM,
 | ||
|           interact with non-TFW code and do various things that might be needed during an exercise
 | ||
|     \item Defining what processes should run inside the container besides the things TFW
 | ||
|           starts automatically
 | ||
|     \item Setting up reverse proxying for any user-facing network applications such as webservers
 | ||
| \end{itemize}
 | ||
| At first all these tasks can seem quite overwhelming.
 | ||
| Remember that \emph{witchcraft} is what we practice here after all.
 | ||
| To overcome the high initial learning curve of getting familiar with the framework
 | ||
| I have created a repository called \emph{test-tutorial-framework} with the purpose of
 | ||
| providing a project template for developers looking to create challenges using the
 | ||
| framework.
 | ||
| This repository is a really simple client codebase that is suitable for
 | ||
| developing TFW itself as well (a good place to host tests for the framework).
 | ||
| 
 | ||
| It also provides an ``industry standard'' \code{hack} directory
 | ||
| containing bash scripts that make the development of tutorials and TFW itself very convenient.
 | ||
| These scripts span from bootstrapping a complete development environment in one command,
 | ||
| to building and running challenges based on the framework. 
 | ||
| Let us take a quick look at the \emph{test-tutorial-framework} project to acquire a greater
 | ||
| understanding of how the framework interacts with client code.
 | ||
| 
 | ||
| \section{Project Structure}
 | ||
| 
 | ||
| \begin{lstlisting}[
 | ||
|     caption={The project structure of test-tutorial-framework},
 | ||
|     captionpos=b]
 | ||
| .
 | ||
| |--config.yml
 | ||
| |
 | ||
| |--hack/
 | ||
| |   |--tfw.sh
 | ||
| |   |--...
 | ||
| |
 | ||
| |--controller/
 | ||
| |   |--Dockerfile
 | ||
| |   |--...
 | ||
| |
 | ||
| |--solvable/
 | ||
|     |--Dockerfile
 | ||
|     |--...
 | ||
| \end{lstlisting}
 | ||
| 
 | ||
| \subsection{Avatao Configuration File}
 | ||
| The \code{config.yml} file is an Avatao challenge configuration file,
 | ||
| which is used describe what kind of Docker containers implement a challenge,
 | ||
| what ports do they expose talking what protocols, define the name of the
 | ||
| excercise, it's difficulity, and so on.
 | ||
| Every Avatao challenge must provide such a file.
 | ||
| The Tutorial Framework does not use this file, this is only required to run
 | ||
| the exercise in production, so it is mostly out of scope for this thesis.
 | ||
| 
 | ||
| \subsection{Controller Image}
 | ||
| It was previously mentioned that the \code{controller} Docker image is responsible
 | ||
| for the solution checking of challenges (whether the user has completed the exercise or not).
 | ||
| Currently this image is maintained in the test-tutorial-framework repository.
 | ||
| It is a really simple Python server which functions as a TFW event handler as well.
 | ||
| It subscribes to the FSM update messages
 | ||
| broadcasted by the \code{FSMManagingEventHandler}, we've previously discussed,
 | ||
| this way it is capable of keeping track of the state of the tutorial,
 | ||
| which allows it to detect if the final state of the FSM is reached.
 | ||
| 
 | ||
| \subsection{Solvable Image}
 | ||
| Currently the Tutorial Framework is maintained in three git repositories:
 | ||
| \begin{description}
 | ||
|       \item[baseimage-tutorial-framework] Docker baseimage (contains all backend logic)
 | ||
|       \item[frontend-tutorial-framework] Angular frontend
 | ||
|       \item[test-tutorial-framework] An example tutorial built using baseimage and frontend
 | ||
| \end{description}
 | ||
| Every tutorial based on the framework must use the TFW baseimage as the parent of
 | ||
| it's own \code{solvable} image, using the \code{FROM}%
 | ||
| \footnote{\href{https://docs.docker.com/engine/reference/builder/\#from}
 | ||
| {https://docs.docker.com/engine/reference/builder/\#from}}
 | ||
| Dockerfile command.
 | ||
| Being an example project of the framework this is the case with
 | ||
| test-tutorial-framework as well.
 | ||
| 
 | ||
| \section{Details of the Solvable Image}
 | ||
| Let us dive into greater detail on how the \code{solvable} Docker image of the
 | ||
| test-tutorial-framework operates.
 | ||
| The directory structure is as follows:
 | ||
| \begin{lstlisting}
 | ||
| solvable/
 | ||
| |--Dockerfile
 | ||
| |--frontend/
 | ||
| |--supervisor/
 | ||
| |--nginx/
 | ||
| |--src/
 | ||
| \end{lstlisting}
 | ||
| I am going to discuss these one by one.
 | ||
| 
 | ||
| \subsection{Dockerfile}
 | ||
| Since this is a Docker image it must define a \code{Dockerfile}.
 | ||
| This image always uses the baseimage of the framework as a parent image.
 | ||
| Besides this developers can use this as a regular \code{Dockerfile} to work with as
 | ||
| they see fit to implement their tutorial.
 | ||
| 
 | ||
| \subsection{Frontend}
 | ||
| This directory is designed to contain a clone of the frontend repository.
 | ||
| By default it is empty and it's contents will be put in place during the
 | ||
| setup of the development environment.
 | ||
| 
 | ||
| \subsection{Supervisor}
 | ||
| As previously mentioned, the framework uses supervisor to run several processes
 | ||
| inside a Docker container.
 | ||
| Usually Docker containers only run a single process and developers simply start
 | ||
| more containers instead of processes if required.
 | ||
| This approach is not suitable for TFW, as it would require the framework to orchestrate
 | ||
| Docker containers from an other container, which is feasible in theory but
 | ||
| very hard and impractial to do in practice.
 | ||
| 
 | ||
| Supervisor is a process control system designed to be able to work with
 | ||
| processes on UNIX-like operating systems.
 | ||
| When a tutorial built on TFW is started, the framework starts supervisor with
 | ||
| PID\footnote{Process ID, on UNIX-like systems the \code{init} program is the first
 | ||
| process started} 1, which in turn starts all the programs defined
 | ||
| in this directory using supervisor configuration files.
 | ||
| For example, a developer would use a file similar to this to run a webserver
 | ||
| written in python:
 | ||
| \begin{lstlisting}
 | ||
| [program:yourprogram]
 | ||
| user=user
 | ||
| directory=/home/user/example/
 | ||
| command=python3 server.py
 | ||
| autostart=true
 | ||
| \end{lstlisting}
 | ||
| As mentioned earlier in~\ref{processmanagement}, any program that is started this way
 | ||
| can be managed by the framewok using API messages.
 | ||
| 
 | ||
| \subsection{Nginx}
 | ||
| For simplicity, exercises based on the framework only expose a single port from the
 | ||
| \code{solvable} container.
 | ||
| This port is required to serve the frontend of the framework.
 | ||
| If this is the case, how do we run additional web applications to showcase vulnerabilies
 | ||
| on during the tutorial?
 | ||
| Since one port can only be bound by one process at a time, we will need to
 | ||
| use a reverse-proxy to to bind the port and redirect traffict to other
 | ||
| webservers binding non-exposed ports.
 | ||
| 
 | ||
| To support this, TFW automatically runs an nginx webserver (it uses this nginx
 | ||
| process to serve the framework frontend as well) we can supply additional configurations to.
 | ||
| Any configuration files placed into this directory will be interpreted by nginx
 | ||
| once the container has started.
 | ||
| To set up the reverse-proxying of a webserver running on port 3333,
 | ||
| one would write a config file similar to this one:
 | ||
| \begin{lstlisting}
 | ||
| location /yoururl {
 | ||
|     proxy_pass http://127.0.0.1:3333;
 | ||
| }
 | ||
| \end{lstlisting}
 | ||
| Now the content server by this websever will be available on ``<challenge\_url>/yoururl''.
 | ||
| It is very important to understand, that developers
 | ||
| have to make sure that their web application \emph{behaves well} behind a reverse proxy.
 | ||
| What this means is that they are going to be serverd from a ``subdirectory'' of an URL:
 | ||
| for example ``/register'' will be served under ``/yoururl/register''.
 | ||
| This means that all links in the final HTML must refer to the proxied urls, e.g.\
 | ||
| ``/yoururl/register'' and server side redirects must point to the correct hrefs as well.
 | ||
| Idiomatically this is usually implemented by supplying a \code{BASEURL}
 | ||
| to the application through an environment variable, so that it is able to set
 | ||
| itself up correctly.
 | ||
| 
 | ||
| \subsection{Copying Configuration Files}
 | ||
| Behind the curtains, the Tutorial Framework uses some Dockerfile trickery to
 | ||
| faciliate the copying of supervisor and nginx configuration files to their correct
 | ||
| locations.
 | ||
| Normally when one uses the \code{COPY}%
 | ||
| \footnote{\href{https://docs.docker.com/engine/reference/builder/\#copy}
 | ||
| {https://docs.docker.com/engine/reference/builder/\#copy}}
 | ||
| command to create a layer%
 | ||
| \footnote{\href{https://docs.docker.com/storage/storagedriver/}
 | ||
| {https://docs.docker.com/storage/storagedriver/}} in a Docker image,
 | ||
| this action takes place when building that image (i.e.\ in the \emph{build context}
 | ||
| of that image).
 | ||
| This is not good for this use case: when building the framework baseimage,
 | ||
| these configuration files that will be written by content developers do not even
 | ||
| exist.
 | ||
| How could we copy files into an image layer that will be created in the future?
 | ||
| 
 | ||
| It is possible to use a command called \code{ONBUILD}%
 | ||
| \footnote{\href{https://docs.docker.com/engine/reference/builder/\#onbuild}
 | ||
| {https://docs.docker.com/engine/reference/builder/\#onbuild}}
 | ||
| in the Dockerfile of a baseimage to delay another command
 | ||
| to the point in time when other images will use the baseimage
 | ||
| as a parent with the \code{FROM} command. This makes it possible to execute
 | ||
| commands in the build context of the descendant image.
 | ||
| This is great, because the config files we need \emph{will} exist in the build
 | ||
| context of the \code{solvable} image of test-tutorial-framework.
 | ||
| In practice this looks something like this in the baseimage \code{Dockerfile}:
 | ||
| \begin{lstlisting}
 | ||
| ONBUILD COPY ${BUILD_CONTEXT}/nginx/ ${TFW_NGINX_COMPONENTS}
 | ||
| ONBUILD COPY ${BUILD_CONTEXT}/supervisor/ ${TFW_SUPERVISORD_COMPONENTS}
 | ||
| \end{lstlisting}
 | ||
| 
 | ||
| \subsection{Source Directory}
 | ||
| The \code{src} directory usually holds tutorial-specific code, such as
 | ||
| the implementations of event handlers, the framework FSM, additional web applications
 | ||
| served by the exercise and generally anything that won't fit in the other,
 | ||
| framework-specific directories.
 | ||
| The use of this directory is not mandatory, only a good practice, as developers
 | ||
| are free to implement the non-TFW parts of their exercises as they see fit
 | ||
| (the copying of these files into image layers are their resposibility).
 | ||
| 
 | ||
| \section{Impelenting a Finite State Machine}
 | ||
| 
 | ||
| The Tutorial Framework allows developers to define state machines in two ways,
 | ||
| as discussed before.
 | ||
| I am going to display the implementation of the same FSM using these methods
 | ||
| to showcase the capabilities of the framework.
 | ||
| 
 | ||
| \subsection{YAML based FSM}
 | ||
| YAML\footnote{YAML Ain't Markup Language \href{http://yaml.org}{http://yaml.org}}
 | ||
| is a human friendly data serialization standard and a superset of JSON.
 | ||
| It is possible to use this format to define a state machine like so:
 | ||
| \lstinputlisting[
 | ||
|     caption={A Finite State Machine implemented in YAML},
 | ||
|     captionpos=b
 | ||
| ]{listings/test_fsm.yml}
 | ||
| This state machine has two states, state \code{0} and \code{1}.
 | ||
| It defines a single transition between them, \code{step_1}.
 | ||
| On entering state \code{1} the FSM will write a message to the frontend messaging component
 | ||
| by invoking TFW library code using Python.
 | ||
| The transition can only occour if the file \code{allow_step_1} exists.
 | ||
| 
 | ||
| YAML based state machine implementations also allow the usage of the Jinja2%
 | ||
| \footnote{\href{http://jinja.pocoo.org/docs/2.10/}{http://jinja.pocoo.org/docs/2.10/}}
 | ||
| templating language to substitute variables into the YAML file.
 | ||
| These substitutions are really powerful, as one could even iterate through arrays
 | ||
| or invoke functions that produce strings to be inserted using this method.
 | ||
| 
 | ||
| \subsection{Python based FSM}
 | ||
| Optionally, the same state machine can be implemented like this in Python using
 | ||
| TFW library code:
 | ||
| \lstinputlisting[
 | ||
|     language=python,
 | ||
|     caption={A Finite State Machine implemented in Python},
 | ||
|     captionpos=b
 | ||
| ]{listings/test_fsm.py}
 | ||
| 
 | ||
| As you can see, both implementations are pretty clean and easy.
 | ||
| The advantage of YAML is that it makes it possible to define callbacks using virtually any
 | ||
| command that is available in the container, which means any
 | ||
| programming language is usable to implement said callbacks.
 | ||
| The advantage of the Python version is that since the framework is being developed in
 | ||
| Python as well, it is going to be easier to interface with library code.
 | ||
| 
 | ||
| \section{Configuring Components}
 | ||
| 
 | ||
| The configuration of built-ins is generally done in two different ways.
 | ||
| For the frontend (Angular) side, developers can edit a \code{config.ts} file,
 | ||
| which is full of key-value pairs of configurable frontend functionality.
 | ||
| These pairs are generally pretty self-documenting:
 | ||
| \lstinputlisting[
 | ||
|     caption={Example of the frontend \code{config.ts} file (stripped down to save space)},
 | ||
|     captionpos=b
 | ||
| ]{listings/config.ts}
 | ||
| Configuring built-in event handlers is possible by editing the Python file they are
 | ||
| initialized in, which exposes several communicative options:
 | ||
| \lstinputlisting[
 | ||
|     language=python,
 | ||
|     caption={Example of inicializing some event handlers},
 | ||
|     captionpos=b
 | ||
| ]{listings/event_handler_main.py}
 | ||
| 
 | ||
| \section{Setting Up a Developer Environment}\label{devenv}
 | ||
| 
 | ||
| To make getting started as smooth as possible I have created
 | ||
| a ``bootstrap'' script which is capable of creating a development envrionment from
 | ||
| scratch.
 | ||
| 
 | ||
| This script is distributed as the following bash one-liner:
 | ||
| \begin{lstlisting}[language=bash]
 | ||
| bash -c "$(curl -fsSL https://git.io/vxBfj)"
 | ||
| \end{lstlisting}
 | ||
| This command downloads a script using \code{curl}%
 | ||
| \footnote{\href{https://curl.haxx.se}{https://curl.haxx.se}}, then executes it in bash.
 | ||
| In the open source community it is quite common to distribute installers this way%
 | ||
| \footnote{A good example of this is oh-my-zsh
 | ||
| \href{https://github.com/robbyrussell/oh-my-zsh}{https://github.com/robbyrussell/oh-my-zsh}},
 | ||
| which might seem a little scary at first, but is not less safe then
 | ||
| downloading and executing a binary installer from a website with a valid TLS certificate, as
 | ||
| \code{curl} will fail with an error message if the certificate is invalid.
 | ||
| This is because both methods place their trust in the PKI~\footnote{Public Key Infrastructure}
 | ||
| to defend against man-in-the-middle%
 | ||
| \footnote{\href{https://www.owasp.org/index.php/Man-in-the-middle_attack}
 | ||
| {https://www.owasp.org/index.php/Man-in-the-middle\_attack}} attacks.
 | ||
| Debating the security of this infrastructure is certainly something that we
 | ||
| as an industry should constantly do, but it is out of the scope of this paper.
 | ||
| 
 | ||
| Nevertheless I have also created a version of this command that
 | ||
| checks the SHA256 checksum of the bootstrap script before executing it
 | ||
| (I have placed it on several lines to enhance visibility):
 | ||
| \begin{lstlisting}[language=bash]
 | ||
| URL=https://git.io/vxBfj                                             \
 | ||
| SHA=d81057610588e16666251a4167f05841fc8b66ccd6988490c1a2d2deb6de8ffa \
 | ||
| bash -c 'cmd="$(curl -fsSL $URL)" &&                                 \
 | ||
|          [ $(echo "$cmd" | sha256sum | cut -d " " -f1) == $SHA ] &&  \
 | ||
|          echo "$cmd" | bash || echo Checksum mismatch!'
 | ||
| \end{lstlisting}
 | ||
| This essentially downloads the script, places it inside a variable as a string,
 | ||
| then pipes it into a bash interpreter \emph{only if} the checksum
 | ||
| of the downloaded string matches the one provided, otherwise it displays
 | ||
| an error message.
 | ||
| Software projects distributing their product as binary installers often
 | ||
| display such checksums on their download pages with the purpose to potentially
 | ||
| mitigating MITM attacks.
 | ||
| 
 | ||
| The bootstrap script clones the three TFW repositories and does several steps
 | ||
| to create a working environment into a single directory, that is based on
 | ||
| test-tutorail-framework:
 | ||
| \begin{itemize}
 | ||
|       \item It builds the newest version of the TFW baseimage locally
 | ||
|       \item It pins the version tag in \code{solvable/Dockerfile},
 | ||
|             so that this newly-built version will be used by the tutorial
 | ||
|       \item It places the latest frontend in \code{solvable/frontend} with
 | ||
|             depencendies installed
 | ||
| \end{itemize}
 | ||
| It is important to note that this script \emph{does not} install anything system-wide,
 | ||
| it only works in the directory it is being executed from.
 | ||
| This is a good practice, as many users --- including me --- find scripts that
 | ||
| write files all around the system intrusive if they could provide the same functionality
 | ||
| while working in a single directory.
 | ||
| 
 | ||
| It is also worth to mention that it would have been a lot easier to simply use Docker Hub%
 | ||
| \footnote{\href{https://hub.docker.com}{https://hub.docker.com}},
 | ||
| but since the code base is currently proprietary we cannot distribute
 | ||
| it using a public medium, and we use our own image registry to store private Docker
 | ||
| images.
 | ||
| 
 | ||
| \section{Building and Running a Tutorial}
 | ||
| 
 | ||
| After the environment has been created using the script described in~\ref{devenv},
 | ||
| it is possible to simply use standard Docker commands to build and run the tutorial.
 | ||
| Yet the \code{hack} directory of test-TFW also provides a script called
 | ||
| \code{tfw.sh} that developers prefer to use for building and running their
 | ||
| exercises.
 | ||
| Why is this the case?
 | ||
| 
 | ||
| \subsection{The Frontend Issue}
 | ||
| 
 | ||
| To be able to understand this, we will have to gain some understanding of the
 | ||
| build process of Angular projects.
 | ||
| 
 | ||
| When frontend developers work on Angular projects, they usually use the built-in
 | ||
| developer tool of the Angular-CLI%
 | ||
| \footnote{\href{https://cli.angular.io}{https://cli.angular.io}},
 | ||
| \code{ng serve} to build and serve their application.
 | ||
| The advantage of this tool is that it automatically reloads the frontend
 | ||
| when the code on disk is changed, and that it is generally very easy to work with.
 | ||
| On the other hand, a disadvantage is that a \code{node_modules} directory
 | ||
| containing all the npm%
 | ||
| \footnote{\href{https://www.npmjs.com}{https://www.npmjs.com}}
 | ||
| dependencies of the project must be present while doing so.
 | ||
| The problem with this is that because the JavaScript ecosystem is a \emph{huge}
 | ||
| mess\cite{NodeModules}, these dependencies can easily get up to
 | ||
| \emph{several hundreds of megabytes} in size.
 | ||
| 
 | ||
| To solve this issue, when creating production builds,
 | ||
| Angular uses various optimizations such as tree shaking%
 | ||
| \footnote{\href{https://webpack.js.org/guides/tree-shaking/}
 | ||
| {https://webpack.js.org/guides/tree-shaking/}}
 | ||
| to remove all the dependencies that won't be used when running the application%
 | ||
| \footnote{Otherwise it won't be possible to serve these applications efficiently
 | ||
| over the internet}.
 | ||
| The problem is, that these things can take a \emph{really} long time.
 | ||
| This is why today frontend builds usually take a lot longer than building anything
 | ||
| not involving JavaScript (such as C++, C\# or any other compiled programming language).
 | ||
| 
 | ||
| This mess presents it's own challenges for the Tutorial Framework as well.
 | ||
| Since hundreds of megabytes of dependencies have no place inside Docker containers%
 | ||
| \footnote{Otherwise it may take tens of seconds just to send the build context to
 | ||
| the Docker daemon, which means waiting even before the build began},
 | ||
| by default the framework will only place the results of a frontend production build
 | ||
| of \code{solvable/frontend} into the image layers.
 | ||
| This slows down the build time of TFW based challenges so much, that instead of like
 | ||
| 30 seconds, they will often take 5 to 10 minutes.
 | ||
| 
 | ||
| \subsection{The Solution Offered by the Framework}
 | ||
| 
 | ||
| To circumvent this, it is possible to entirely exclude the Angular frontend from a TFW
 | ||
| build, using build time arguments%
 | ||
| \footnote{In practice this is done by supplying the option
 | ||
| \code{--build-arg NOFRONTEND=1} to Docker}.
 | ||
| But when doing so, developers would have to run the frondent locally with
 | ||
| the whole \code{node_modules} directory present.
 | ||
| The bootstrap script takes care of putting these dependencies there,
 | ||
| while the \code{tfw.sh} script is capable of starting a development server
 | ||
| to serve the frontend locally using \code{ng serve} besides starting
 | ||
| the Docker container without the frontend.
 | ||
| If this whole thing wasn't complicated enough, since Docker binds the port
 | ||
| the container is going to use, \code{tfw.sh} has to run this dev server on
 | ||
| an other port, then use the proxying features of Angular-CLI to forward requests
 | ||
| from this port to the runnign Docker container when requesting resources
 | ||
| other then the entrypoint to the Angular application.
 | ||
| 
 | ||
| This is the reason why the frontend is accessible through port \code{4200} (default
 | ||
| port for \code{ng serve}) when using \code{tfw.sh} to start a tutorial, but when running
 | ||
| a self-contained container built with the frontend included it is accessible on port \code{8888}
 | ||
| (the default port TFW uses).
 | ||
| 
 | ||
| While it also provides lots of other functionality, this is one of the reasons why
 | ||
| the \code{tfw.sh} script is a several hundreds of lines long bash script.
 | ||
| The implementation of making the frontend toggleable during Docker builds requires some
 | ||
| of the \code{ONBUILD} stuff we've discussed earlier:
 | ||
| \begin{lstlisting}[language=bash]
 | ||
| ONBUILD RUN test -z "${NOFRONTEND}"                                &&\
 | ||
|             cd /data && yarn install --frozen-lockfile || :
 | ||
| 
 | ||
| ONBUILD RUN test -z "${NOFRONTEND}"                                &&\
 | ||
|             cd /data && yarn build --no-progress || :
 | ||
| 
 | ||
| ONBUILD RUN test -z "${NOFRONTEND}"                                &&\
 | ||
|             mv /data/dist ${TFW_FRONTEND_DIR} && rm -rf /data || :
 | ||
| \end{lstlisting}
 | ||
| Remember that \code{ONBUILD} commands run in the build context of the child image.
 | ||
| What these commands do is they check if the \code{NOFRONTEND} build argument
 | ||
| is present or not, and only deal with the frontend if this argument is not defined.
 | ||
| The \code{|| :} notation in bash basically means ``or true'', which is required
 | ||
| to avoid aborting the build due to the non-zero return code produced
 | ||
| by the \code{test} command if the build arg is defined.
 |