diff --git a/content/a_tour_of_tfw.tex b/content/a_tour_of_tfw.tex index ea03a7e..b546e4c 100644 --- a/content/a_tour_of_tfw.tex +++ b/content/a_tour_of_tfw.tex @@ -9,7 +9,7 @@ based on the TFW architecture themselves. A \emph{very important} point to keep in mind is that most of this exercise-specific logic will be implemented in \textbf{FSM callbacks} and custom \textbf{event handlers}. -The whole framework is built in a way to faciliate this process and developers +The whole framework is built in a way to facilitate this process and developers who understand this mindset almost always find it a breeze to create great content using TFW\@. @@ -23,7 +23,7 @@ for instance the built-in code editor requires a frontend component and an event handler to function properly, while the component responsible for drawing out and managing frontend components implements no event handler, so it purely exists in Angular. -An other example of a purely frontend component would be the messages component, +Another example of a purely frontend component would be the messages component, which is used to display messages to the user. In the Tutorial Framework most of the built-ins define APIs, which are TFW messages @@ -61,7 +61,7 @@ Some components emit or broadcast messages on specific events, for instance the "from_state": ...string..., "to_state": ...string..., "trigger": ...string..., - "timestamp": ...unix timestamp... + "timestamp": ...UNIX timestamp... } ... } @@ -81,7 +81,7 @@ The framework must allow content creators to communicate with the user, and provide some mechanism to enable ``talking'' to them. This is the responsibility of the \emph{messages} component, which provides a chatbox-like element on the frontend. -The simplest form of communication it accomodates it the insertion of text +The simplest form of communication it accommodates it the insertion of text into the chatbox through API messages. This component always expects messages it receives to be in Markdown% \footnote{\href{https://daringfireball.net/projects/markdown/} @@ -111,7 +111,7 @@ Similarly to a real chat application, some The timing of pauses and messages is based on the \emph{WPM} --- or Words Per Minute --- set by developers according to their specific requirements. This creates an experience similar to chatting with someone in real time, as the time -it takes for each message to be displayed depends on the lenght of the previous message. +it takes for each message to be displayed depends on the length of the previous message. This illusion is made possible through appropriate \code{setTimeout()} calls in TypeScript and some elementary math to calculate the proper delays in milliseconds based on message lengths: @@ -121,10 +121,10 @@ message lengths: \[ timeoutSeconds = lastMessageLength / charactersPerSeconds \] \[ timeoutMilliseconds = timeoutSeconds * 1000 \] -The value 5 comes from the fact that on average english words are 5 +The value 5 comes from the fact that on average English words are 5 characters long according to some studies. This value could be made configurable in the future, but currently there -are no plans to make non-english challenges. +are no plans to make non-English challenges. \section{IDE Component}\label{idecomponent} @@ -139,27 +139,27 @@ To implement this IDE% I have integrated the open source Monaco editor developed by Microsoft into the Angular web application TFW uses as a frontend, and added functionality to it that allows the editor to integrate with the framework. -This involves commnication with an event handler dedicated to this feature, +This involves communication with an event handler dedicated to this feature, which is capable of reading and writing files to disk, while sending and receiving editor content from the frontend component. The interaction of this event handler and the Monaco editor provides a seamless -editing experience, featuring autosave at configurable intervals, code completion, +editing experience, featuring auto-save at configurable intervals, code completion, automatic code coloring for several programming languages and more. Perhaps the most ``magical'' feature of this editor is that if any process in the Docker container writes a file that is being displayed in the editor, -the contents of that file are automatially refreshed without any user +the contents of that file are automatically refreshed without any user interaction whatsoever. Besides that, if a file is created in the directory the editor is configured -to display, that file is automatially displayed on a new tab in the IDE. +to display, that file is automatically displayed on a new tab in the IDE\@. This allows for really interesting demo opportunities. -Lets say I create a file using the terminal on the frontend by executing the +Let's say I create a file using the terminal on the frontend by executing the command \code{touch file.txt}. A new tab on the editor automatically appears. If I select it I can confirm that I have successfully created an empty file. After this let's run a \code{while} cycle in the command line which -peroadically appends some text to \code{file.txt}: +periodically appends some text to \code{file.txt}: \begin{lstlisting}[captionpos=b,caption={Bash while cycle writing to a file periodically}, language=bash] while true @@ -170,7 +170,7 @@ done \end{lstlisting} The results speak for themselves: \pic{figures/ide_demo.png}{The editor demo involving automatic file refreshing} -As you can see, the file contents are automatially updated as the bash script appends +As you can see, the file contents are automatically updated as the bash script appends to the file. This feature is implemented by using the inotify API% \footnote{\href{http://man7.org/linux/man-pages/man7/inotify.7.html} @@ -217,7 +217,7 @@ terminal emulator to do so. This component has a tiny server process which is managed by a TFW event handler. This small server is responsible for spawning bash sessions and -unix pseudoterminals (or \code{pty}s) in the \code{solvable} Docker +UNIX pseudoterminals (or \code{pty}s) in the \code{solvable} Docker container. It is also responsible for connecting the master end of the \code{pty} to the emulator running in the browser and the slave end to the bash session it has @@ -225,8 +225,8 @@ spawned. This way users are able to work in the shell displayed on the frontend just like they would on their home machines, which allows for great tutorials explaining topics that involve the usage of a shell. -Note that this allows us to cover an extremely wide variety of topics using TFW: -from compiling shared libraries for development, to using cryptographic FUSE filesystems +Note that this allows us to cover an extremely wide variety of topics using TFW\@: +from compiling shared libraries for development, to using cryptographic FUSE file systems for enhanced privacy% \footnote{\href{https://github.com/rfjakob/gocryptfs}{https://github.com/rfjakob/gocryptfs}}, or automating cloud infrastructure using Ansible% @@ -248,7 +248,7 @@ container using an interactive bash session. This is not an easy thing to accomplish without relying on some sort of heavyweight monitoring solution such as Sysdig% \footnote{\href{https://sysdig.comq}{https://sysdig.com}}. -I deemed most simiar systems a huge overkill to implement this functionality, and their +I deemed most similar systems a huge overkill to implement this functionality, and their memory footprints are not something we could afford here% \footnote{These containers will be spawned on a per-user basis, so we must be as conservative with memory as possible.}. @@ -273,8 +273,8 @@ but that should not be an issue as this is not a feature that is intended to be used in competitive environments (and if the users of a tutorial intentionally break the system under themselves, well, good for them). -An other advantage of this method is that this can be applied to any interactive -application that supports logging commands executed in them in some way or an other. +Another advantage of this method is that this can be applied to any interactive +application that supports logging commands executed in them in some way or another. A good example would be GDB% \footnote{\href{https://www.gnu.org/software/gdb/}{https://www.gnu.org/software/gdb/}}, which supports an option called \code{set trace-commands on}. This option flushes @@ -293,7 +293,7 @@ The console has no event handler: it is a purely frontend component which expose API through TFW messages to write and read it's contents. It works great when combined with the process management capabilities of the framework: -if configured to do so it can display the output of processes like webservers in real time. +if configured to do so it can display the output of processes like web servers in real time. When using this next to the TFW frontend editor, it allows for a development experience similar to working in an IDE on your laptop. @@ -308,10 +308,10 @@ as well as switching between them mid-tutorial using API messages. The framework includes an event handler capable of managing processes running inside the \code{solvable} Docker container. -The capabilities of this componenet include the starting, stopping and restarting of processes, +The capabilities of this component include the starting, stopping and restarting of processes, as well as emitting the standard out or standard error logs belonging to them, even in real-time (by broadcasting TFW messages). -This logging feature allows for interesting possiblities such as the handling +This logging feature allows for interesting possibilities such as the handling of live process output, or just requesting the logs belonging to a certain application when some sort of event has occurred (such as on errors). This component also can be interacted with using TFW API messages. @@ -353,7 +353,7 @@ location \code{/tmp/}. All logs coming from the container itself were also logged to this location. This had caused an infinite recursion: when a process would write to \code{/tmp/} inotify would invoke a process that would also log to that location causing the kernel to -emit more inotify events, which in turn would cause more and more new proesses to spawn +emit more inotify events, which in turn would cause more and more new processes to spawn and write to \code{/tmp/}, causing the whole procedure to repeat again and again. This continued until my machine would start to run out of memory and begin swapping pages to disk% @@ -369,7 +369,7 @@ fascinating phenomenon, but those were \emph{very} fun hours at least. \section{FSM Management} I have already mentioned the event handler called \code{FSMManagingEventHandler}, -which is responsible for managing the framework FSM. +which is responsible for managing the framework FSM\@. For completeness I chose to include it in this chapter as well. The API it exposes through TFW messages allows client code to attempt stepping the state machine. @@ -406,7 +406,7 @@ thing work with the Same Origin Policy% \footnote{\href{https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy} {https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin\_policy}} being in effect? -The answer is that developers must use a \emph{relative url}, that is an URL relative +The answer is that developers must use a \emph{relative URL}, that is an URL relative to the entry point of the TFW frontend itself. To allow serving several web applications from a single port the framework supports optional reverse-proxy configurations through the nginx% @@ -416,8 +416,8 @@ More on this in a Chapter~\ref{usingtfw}. \section{Various Frontend Features} The Angular frontend of the framework features several different layouts. -These layouts are useful to accomodate different workflows for users, -such as the previous exampe of editig code and being able to view the +These layouts are useful to accommodate different workflows for users, +such as the previous example of editing code and being able to view the result of said code in real time next to the editor. Another example would be editing Ansible playbooks in the file editor, and then trying to run them in the terminal. @@ -426,11 +426,11 @@ to be used like that, for instance the code editor can be used to conveniently edit larger files this way. The frontend was designed in a way to be fully responsive in window sizes -that still keep the whole thing usable (i.e.\ it would not be practial to start +that still keep the whole thing usable (i.e.\ it would not be practical to start solving TFW tutorials on a smart phone, simply because of size limits, so such small screens are not supported, but the frontend still behaves as expected on small laptops or bigger tablets). -This is not an easy thing to impelent and maintain due to the lots of small -incompatibilites between browsers given the complexity of the frontend. +This is not an easy thing to implement and maintain due to the lots of small +incompatibilities between browsers given the complexity of the frontend. Just remember that a few years ago the clearfix% \footnote{\href{https://stackoverflow.com/questions/8554043/what-is-a-clearfix} @@ -438,7 +438,7 @@ Just remember that a few years ago the clearfix% hack was the industry standard in creating CSS layouts. The situation has improved \emph{a lot} since then with flexboxes and grid layouts despite the sheer chaos that is generally involved in web -standardization efforts, but CSS espacially% +standardization efforts, but CSS especially% \footnote{\href{https://developer.mozilla.org/en-US/docs/Web/CSS/CSS3} {https://developer.mozilla.org/en-US/docs/Web/CSS/CSS3}}. @@ -460,10 +460,10 @@ keep your software up to date. There are several additional APIs exposed by the frontend, which include the changing of layouts, selecting the terminal or console component to be displayed, the possibility of dynamically modifying -frontend configuration values (such as the frequency of autosaving the files in the editor) +frontend configuration values (such as the frequency of auto-saving the files in the editor) and more. -To accomodate communication with the TFW server, the frontend of the framework +To accommodate communication with the TFW server, the frontend of the framework comes with some library code which can be used to send and receive TFW messages. This code is mostly WebSockets combined with RxJS% \footnote{\href{https://rxjs-dev.firebaseapp.com}{https://rxjs-dev.firebaseapp.com}}, diff --git a/content/acknowledgements.tex b/content/acknowledgements.tex index ed3be42..139a642 100644 --- a/content/acknowledgements.tex +++ b/content/acknowledgements.tex @@ -1,11 +1,11 @@ -\chapter*{Acknowledgements} -\addcontentsline{toc}{chapter}{Acknowledgements} +\chapter*{Acknowledgments} +\addcontentsline{toc}{chapter}{Acknowledgments} This creation of this framework would not have been possible alone. In this chapter I would like to express my gratitude towards great people who have -helped me in some way or an other along the way. +helped me in some way or another along the way. -First of all I would like to thank Bálint Bokros, my good friend and colleauge for +First of all I would like to thank Bálint Bokros, my good friend and colleague for his awesome work regarding TFW and for always being open to provide useful input. He has also earned my gratitude by always being there to lift my spirits, be that @@ -22,7 +22,7 @@ I can't thank my consultant, Levente Buttyán enough for enduring my general inability to deal with deadlines and administration. I also appreciate the great morale my colleagues and friends provided in the office, -by always being there, be that for work or fun. This project couldn't have been realised +by always being there, be that for work or fun. This project couldn't have been realized sitting in a depressing cube between 200 people hating their jobs. They also have my gratitude for direct contributions to the framework, be that with ideas, assistance, or actual code. diff --git a/content/architecture.tex b/content/architecture.tex index 7e63776..8d0312b 100644 --- a/content/architecture.tex +++ b/content/architecture.tex @@ -13,9 +13,9 @@ During most of this chapter I am going to be discussing the \code{solvable} Dock with the exception of Section~\ref{solutioncheck}, where I will dive into how the \code{controller} image is implemented. -The most important feature of the framework is it's messaging system. +The most important feature of the framework is its messaging system. Basically what we need is a system where processes running inside a Docker container -would be allowed to communicate with eachother. +would be allowed to communicate with each other. This task is very easy to solve, with lots of possible solutions (named pipes, sockets or shared memory to name a few). The hard part is that frontend components running inside a web browser --- which could @@ -27,7 +27,7 @@ that can communicate with JavaScript running in a browser connected to it. The solution the framework uses is a proxy server, which connects to frontend components on one side and handles interprocess communication on the other side. This way the server is capable of proxying messages between the two sides, enabling -communitaion between them. +communication between them. Notice that this way what we have is essentially an IPC% \footnote{Interprocess communication} system in which a web application can ``act like'' it was running on the backend in a sense: it is easily able to @@ -43,7 +43,7 @@ which was later redesigned and fully rewritten by me to allow for greater flexib message authentication, restoration of frontend state, a complete overhaul of the state tracking system and the possibility for solution checking among other things). If you are explicitly interested in the differences between the original POC implementation -(which is out of scope for this thesis due to lenght constraints) and the current +(which is out of scope for this thesis due to length constraints) and the current framework please consult Bálint's excellent paper and Bachelor's thesis on it\cite{BokaThesis}. \end{note} @@ -52,7 +52,7 @@ some of the design decisions behind this: \subsection{Connecting to the Frontend} -The old way of creating dynamic webpages was AJAX% +The old way of creating dynamic web pages was AJAX% \footnote{AJAX stands for Asynchronous JavaScript And XML, despite usually not having anything to do with XML in practice.} polling, which is basically sending @@ -62,16 +62,16 @@ HTTP request-response on each update). This has been superseded by WebSockets around 2011, which provide a full-duplex communication channel over TCP between your browser and the server. This is done by initiating a protocol handshake using the \code{Connection: Upgrade} -HTTP header, which establishes a premanent socket connection between the browser +HTTP header, which establishes a permanent socket connection between the browser and the server. This allows for communication with lower overhead and latency facilitating efficient real-time applications, which were not always possible to create before due to the overheads% \footnote{In some applications this overhead could be bigger than the actual data sent, -such as singaling.} introduced by AJAX polling. +such as signaling.} introduced by AJAX polling. -The Tutorial Framework uses WebSockets to connect to it's web frontend. -The TFW proxy server is capable to connecting to an arbirary number of WebSockets, +The Tutorial Framework uses WebSockets to connect to its web frontend. +The TFW proxy server is capable to connecting to an arbitrary number of WebSockets, which allows the framework to simultaneously connect to components running in separate browser windows and tabs, or even in different browsers altogether (such as opening a terminal in Chrome and an IDE in Firefox). @@ -88,11 +88,11 @@ RabbitMQ% ZMQ does not require a message broker daemon to be running in the background at all times and as such has a much lower memory footprint while still providing various messaging patterns and bindings for almost any widely used programming language. -An other --- yet untilized --- capability of this solution is that since ZMQ is capable +Another --- yet unutilized --- capability of this solution is that since ZMQ is capable of using simple TCP sockets, we could even communicate with processes running on remote hosts using the current architecture of the framework. -There are various lower level and higher level alternatives for IPC other than +There are various lower level and higher-level alternatives for IPC other than ZMQ which were also considered during the design process of the framework at some point. A few examples of top contenders and reasons for not using them in the end: \begin{itemize} @@ -101,13 +101,13 @@ A few examples of top contenders and reasons for not using them in the end: all bytes are sent or received both require constantly checking the return values of the libc \code{send()} and \code{recv()} system calls% \footnote{Developers forget this very often, resulting in almost untraceable bugs -that seem to occour randomly.}, +that seem to occur randomly.}, while ZMQ takes care of this - extra logic involved and even provides higher level messaging patterns such as + extra logic involved and even provides higher-level messaging patterns such as subscribe-publish, which would need to be implemented on top of raw sockets again. \item Using something like gRPC\footnote{\href{https://grpc.io}{https://grpc.io}} or plain HTTP (both of which - are considered to be higher level than ZMQ sockets) would require + are considered to be higher-level than ZMQ sockets) would require all processes partaking in the communication to be HTTP servers themselves, which would make the framework less lightweight and flexible: socket communication with or without ZMQ does not @@ -125,7 +125,7 @@ simultaneously.} in nature, which extorts certain design choices on code Now being familiar with the technological basis of the framework we can now discuss it in more detail. -\pic{figures/tfw_architecture.png}{An overwiew of the Tutorial Framework} +\pic{figures/tfw_architecture.png}{An overview of the Tutorial Framework} Architecturally TFW consists of four main components: \begin{itemize} @@ -138,10 +138,10 @@ Architecturally TFW consists of four main components: \end{itemize} Note that it is important to keep in mind that as I've mentioned previously, the TFW server and event handlers reside in the \code{solvable} Docker container. -They all run in separate processes and only communicate with eachother using ZeroMQ sockets. +They all run in separate processes and only communicate with each other using ZeroMQ sockets. In the following sections I am going to explain each of the main components in -greater detail, as well as how they interact with eachother, +greater detail, as well as how they interact with each other, their respective responsibilities, some of the design choices behind them and more. @@ -149,7 +149,7 @@ some of the design choices behind them and more. All components in the Tutorial Framework use JSON% \footnote{JavaScript Object Notation: \href{https://www.json.org}{https://www.json.org}} -messages to communicate with eachother. +messages to communicate with each other. These messages must also comply some simple rules specific to the framework. Let's inspect further what a valid TFW message might look like: @@ -186,14 +186,14 @@ at a later point in this paper. \subsection{Networking Details} -The default behaviour of the TFW server is that it forwards all messages from coming from +The default behavior of the TFW server is that it forwards all messages from coming from the frontend to the event handlers and vice versa. So messages coming from the WebSockets of the frontend are forwarded to event handlers via ZMQ and messages received on ZMQ from event handlers are forwarded to the frontend via WebSockets. The TFW server is also capable of ``reflecting'' messages back to the side they were -received from (to faciliate event handler to event handler communication for instance), +received from (to facilitate event handler to event handler communication for instance), or broadcast messages to all components. This is possible by embedding a whole TFW message in the \code{data} field of an outer wrapper message with a special \code{key} that signals to the TFW server that @@ -236,10 +236,10 @@ at any time when the running of the tests would be required. An interesting thing to mention is that there \emph{could} be event handlers which broadcast messages with a \code{key} that they are also subscribed to. -This can distrupt their behaviour in weird ways if they are not prepared to +This can disrupt their behavior in weird ways if they are not prepared to deal with their own ``echoes''. The framework offers a solution for this by providing a special -event handler type, which is capable of filtering out it's own broadcasts. +event handler type, which is capable of filtering out its own broadcasts. The way they do this is by caching the checksum of every message they broadcast, and ignore the first message that comes back with the same checksum. @@ -265,7 +265,7 @@ One being that Angular is essentially a complete platform that is very well suitable for building complex architecture into a single page application. Other reasons included that the frontend of the Avatao platform is also written in Angular (bonus points for experienced team members in the company). -An other good thing going for it is that Angular forces you to use TypeScript% +Another good thing going for it is that Angular forces you to use TypeScript% \footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}} which tries to remedy some of the issues\cite{JavaScript} with JavaScript by being a language that transpiles to JavaScript while @@ -278,7 +278,7 @@ strongly encouraging things like static typing or Object Oriented Principles. A good chunk of the framework codebase is a bunch of pre-made, built-in components that implement commonly required functionality for developers to use. These components usually involve an event handler and an Angular component -communicating with eachother to realize some sort of functionality. +communicating with each other to realize some sort of functionality. An example would be the built-in code editor of the framework (visible on the right side of Figure~\ref{figures/tfw_frontend.png}). This code editor essentially is a Monaco editor% @@ -286,7 +286,7 @@ This code editor essentially is a Monaco editor% {https://microsoft.github.io/monaco-editor/}} instance integrated into Angular and upgraded with the capability to exchanges messages with an event handler to save, read and edit files -that reside in the writeable file system of the \code{solvable} +that reside in the writable file system of the \code{solvable} Docker container. All of the built-ins come with a full API documentation explaining what they do @@ -319,7 +319,7 @@ search these messages for the given string. The exact capabilities of these built-in components will be explained in greater detail in Chapter~\ref{atouroftfw}. -Developers who are well-aware of these capabilites are able to use the framework in extremely +Developers who are well-aware of these capabilities are able to use the framework in extremely creative ways allowing for very interesting functionality, such as the above example. The components of TFW can often be combined to work together in unexpected, yet useful ways, similarly how command-line utilities on UNIX-like systems do. @@ -365,7 +365,7 @@ Depending on the results three cases are possible: This example shows how content creators can create tutorials that could behave in many different ways based on what the user does. -In high quality challenges developers can implement several ``paths'' to +In high-quality challenges developers can implement several ``paths'' to a successful completion. This is a very engaging feature that offers an immersive learning experience for users, which many solutions for distance education lack so often. @@ -373,7 +373,7 @@ users, which many solutions for distance education lack so often. Developers can use a YAML file or write Python code to implement finite state machines in TFW\@. This is going to be further detailed in Chapter~\ref{usingtfw}. -In the implementation of state machines it is also possbile to subscribe callbacks to be +In the implementation of state machines it is also possible to subscribe callbacks to be invoked on certain events regarding the machine, such as before and after state transitions, or on entering and exiting a state. It is \emph{very} important to be aware of these callbacks, as much of the @@ -404,7 +404,7 @@ can be used to digitally sign messages (this is what the \code{signature} messag field is designed for) using HMAC% \footnote{Hash-based message authentication code}. In this case the TFW server will only forward the privileged messages that -have a valid signature, and the evend handler managing the state machine +have a valid signature, and the event handler managing the state machine will also validate the signature of messages it receives (and sign the updates it broadcasts as well, so that other components can verify that they come from a trusted source). @@ -425,6 +425,6 @@ makes the Tutorial Framework suitable for implementing traditional hacking challenges, such as exercises developed for CTF% \footnote{A ``capture the flag'' game is a competition designed for professionals --- or just people interested in the field --- to sharpen their skills in IT security. -Avatao often organises similar events.} +Avatao often organizes similar events.} events, as the controller image is also capable of verifying the authenticity of FSM update messages via inspecting their signatures. diff --git a/content/introduction.tex b/content/introduction.tex index de221e5..a17fcb3 100644 --- a/content/introduction.tex +++ b/content/introduction.tex @@ -3,7 +3,7 @@ \section{Project justification} As the world is being completely engulfed by software, the need for accessible, but -high quality learning materials covering software engineering and especially secure software +high-quality learning materials covering software engineering and especially secure software engineering is on the rise. While we are enjoying the comfort that information technology provides us, we often forget about the risks involved in relying so much on software in our everyday lives. @@ -25,7 +25,7 @@ sensitive data through our ill-protected smart phones\cite{Android} and IoT devi What a time to be alive. It is important to express that IT security is something that is \emph{really hard} to get right. -Even if right often only means better then your neighbour, as perfect security is an utopia +Even if right often only means better then your neighbor, as perfect security is an utopia that doesn't seem to exist\cite{NoPerfectSecurity}. Often when large and reputable companies in the industry such as CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times @@ -44,9 +44,9 @@ The only thing we can hope and work for is to become better and better as time and generations pass by. We \emph{must} do better, and education is the way forward. -The short term goal of this project --- and the goal of this thesis --- is to provide +The short-term goal of this project --- and the goal of this thesis --- is to provide a new angle in the education of software engineering, especially secure software -engineering based on the aspirations above, with the long term goal of bringing +engineering based on the aspirations above, with the long-term goal of bringing something new to the table in the matter of IT education as a whole (not just for developers, but for users as well). @@ -63,17 +63,17 @@ universities around the world and providing a solution for companies in building \emph{security consciousness} amongst their developer teams. Since starting out we have amassed some experience in building fun challenges -that showcase the exploitation and fixing of relevant security vulnerabilites in code or +that showcase the exploitation and fixing of relevant security vulnerabilities in code or configuration. Traditionally these exercises revolved around offensive and defensive tasks, with challenges often being split into two or more parts. For example users would have to hack a website by exploiting a buffer overflow vulnerability, -then in the second challenge they would fix the code they've just exploited in a web based +then in the second challenge they would fix the code they've just exploited in a web-based code editor. -These kind of exercises offer great flexibility to reflect real world security issues, as in -more complex challenges users might be required to exploit multiple vulnerabilites for success, +These kind of exercises offer great flexibility to reflect real-world security issues, as in +more complex challenges users might be required to exploit multiple vulnerabilities for success, and understand the ways they augment each other. -We often recreate real world scenarios based on incident reports released by companies for +We often recreate real-world scenarios based on incident reports released by companies for added authenticity and relevance\cite{AkosFacebook}. Our challenges usually involve some sort of website acting as frontend for the vulnerable application, or require the user to connect using SSH\@. @@ -83,24 +83,24 @@ application, or require the user to connect using SSH\@. The Avatao platform relies heavily on Docker containers to spawn challenges, which makes it extremely flexible in terms of what is possible to do when creating content. -Essentially anything that you can do inside a Docker conainer can be done on +Essentially anything that you can do inside a Docker container can be done on the Avatao platform as well. Currently each challenge is implemented as a set of Docker images residing inside a Git repository exclusive to the specific challenge in mind. -Our content creation wokflow enables developers to create such repositories on GitHub, +Our content creation workflow enables developers to create such repositories on GitHub, which are automatically set up with the proper webhooks, so that when their content gets reviewed (and their feature branches merged), their changes will go live on the platform as well. In the future we also plan on supporting the use of virtual machines to implement -challenges, which could further increase this fexibility by addig the possiblity to do +challenges, which could further increase this flexibility by adding the possibility to do things like exercises involving the use of Docker or Windows based challenges. \section{Emergence}\label{intro:emergence} While working as a content creator I have stumbled into the idea of automating the completion -of challenges for QA\footnote{Quality Assurrance} and demo purposes. +of challenges for QA\footnote{Quality Assurance} and demo purposes. I used to record short videos or GIFs to showcase my content to management. -In a certain scenario I was required to integrate a web based terminal emulator into a +In a certain scenario I was required to integrate a web-based terminal emulator into a frontend application to improve user experience by making it possible to use a shell right on the website rather than having to connect through SSH\@. @@ -110,7 +110,7 @@ as I have often found myself recording over and over again for a demo without an During the time I was playing around with this idea, researching possible solutions have led me to a hidden gem of a project on GitHub called \code{demo-magic}% \footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}}, -which is esentially a bash script that simulates someone typing into a terminal and executing +which is essentially a bash script that simulates someone typing into a terminal and executing commands. I have created a fork% @@ -123,7 +123,7 @@ the solution script with the challenge code itself, making it toggleable using b variables. Should the solution script be enabled, the challenge would automatically start% \footnote{I did this by injecting the solution script into the user's \code{.bashrc} file.} -completing itself in the terminal integrated into it's frontend, often even explaining the +completing itself in the terminal integrated into its frontend, often even explaining the commands executed during the solution process. \lstinputlisting[ @@ -153,9 +153,9 @@ a related task. This teacher scenario would allow you to take the helm sometimes and try applying your newfound skills in action immediately. -For example a chatbot would show you how to encrypt a file using GnuGP% +For example a chatbot would show you how to encrypt a file using GnuPG% \footnote{\href{https://www.gnupg.org}{https://www.gnupg.org}}, -then it would ask you to encrypt an other file similarly. +then it would ask you to encrypt another file similarly. After this the bot could teach you how to a configure a database server and then ask you to write a configuration file yourself and then encrypt it because it might contain sensitive data such as open ports, usernames and such. @@ -177,20 +177,20 @@ If the user changes the source code of the application and clicks this button, t should restart itself with the new code. Let's say that the user comments out the part that authenticates a user. In this case the application should let anyone log in dummy credentials. -Meanwhile a console could show the output of the webserver. +Meanwhile a console could show the output of the web server. For example if the source code the user tried to deploy was invalid, the framework should report the exact exception raised while running the application. \pic{figures/webapp_and_editor.png}{The code editor and web application example in TFW} Even if we did all this, we would still need a way to integrate this whole thing into -a web based frontend with a file editor, terminal, chat window and stuff like that. +a web-based frontend with a file editor, terminal, chat window and stuff like that. Turns out that today all this can be done by writing a few hundred lines of Python code which uses the Tutorial Framework. \pic{figures/webapp_and_editor_err.png}{Invalid code and deployment failure with process output} -Note that it is possible to try out the current version of the Tutorial Framewok +Note that it is possible to try out the current version of the Tutorial Framework using a guest account on the Avatao platform on this \href{https://platform.avatao.com/paths/d0ccef1f-0389-45bf-9d44-e85b86d66c49/challenges/a7e08c0a-199f-4f8d-aa7e-51b6e9bfcb15}{url}% \footnote{\href{https://platform.avatao.com/paths/d0ccef1f-0389-45bf-9d44-e85b86d66c49/challenges/a7e08c0a-199f-4f8d-aa7e-51b6e9bfcb15} @@ -202,7 +202,7 @@ Based on this it is now more or less possible to define requirements for the pro The reason for the ``more or less'' part is that all of this is pretty much bleeding edge, where the requirements could shift dynamically with time. For this reason I am going to be as general as possible, to the point that some of -this might even sound vauge. +this might even sound vague. To achieve our goals we would need: \begin{itemize} @@ -210,8 +210,8 @@ To achieve our goals we would need: \item a way to to handle various events (i.e.\ we can react when the user has edited a file, or has executed a command in the terminal) \item a highly flexible messaging system, in which processes running on the backend and - frontend components running in a web browser could communicate with eachother - \item a web based frontend with lots of built-in options (terminal, file editor, chat + frontend components running in a web browser could communicate with each other + \item a web-based frontend with lots of built-in options (terminal, file editor, chat window, etc.) that use said messaging system \item stable APIs that can be exposed to content creators to work with (so that framework updates won't break client code) @@ -220,11 +220,11 @@ To achieve our goals we would need: \section{Early Development} -Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company +Around a year ago a good friend and colleague of mine Bálint Bokros, the CTO of our company Gábor Pék and myself would start designing the TFW architecture. In this early phase we would research solutions for the issues described such as tracking user progress, process management, interprocess communication -and making a web based frontend application capable of communicatig with processes running +and making a web-based frontend application capable of communicating with processes running inside a Docker container. After seeing some sort of light at the end of the tunnel regarding what technologies could @@ -253,11 +253,11 @@ But since the project has followed Agile Methodology% \footnote{Manifesto for Agile Software Development: \href{https://agilemanifesto.org}{https://agilemanifesto.org}} from the start, we were able to adapt to these changes without losing -the progess he made in said thesis. Quoting from the Agile Manifesto: +the progress he made in said thesis. Quoting from the Agile Manifesto: ``Responding to change over following a plan''. This is a really important takeaway. -After becoming a full time employee at Avatao I was tasked with developing the project +After becoming a full-time employee at Avatao I was tasked with developing the project with Bálint, who was later reassigned to work on the GDPR compliance of the platform. Thus it became my job to turn the framework into a stable code base ready for usage by content creators and to implement most of the features that we've envisioned diff --git a/content/summary.tex b/content/summary.tex index c86bccb..ed17322 100644 --- a/content/summary.tex +++ b/content/summary.tex @@ -18,13 +18,13 @@ be honest and admit that I have a sweet spot for this project. Currently a total of 63 tutorials based on the framework are running in production, with new ones being released on a weekly basis. -These exercises have been solved several hunders of times. +These exercises have been solved several hundreds of times. User feedback is getting better and better as the project moves forward. As a maintainer, currently I know about a single unfixed bug in the framework, which is getting reported by users as well. There are more, of course, the world is never going to run out of bugs to fix, but at least I sleep well knowing that things aren't breaking on a constant basis. -Considering that this is a one year old project including initial development, +Considering that this is a one-year-old project including initial development, I'd consider this a solid success. We were able to achieve most --- if not all --- of the goals we have envisioned on the @@ -38,27 +38,27 @@ apart from implementing new features, as these will always keep coming in, and w have some great ones planned, that I can promise. First of all I think that we need to put more focus on developing TFW, as currently -other projects are often being priorized over it. +other projects are often being prioritized over it. While some of these are understandable, the framework is a very promising project -with great potential and deserves more attantion from us. +with great potential and deserves more attention from us. The fact that it is stable does not validate neglecting it. I'd also like to concentrate on stabilizing the API of the framework. Currently each major release lasts for a few months before I am forced to break something -to accomodate new features. +to accommodate new features. While the communication of these breakages is fine --- we use mailing lists for this purpose and our versioning scheme seems solid so far, this forces developers -to constantly update older tutorials to comply new API. +to constantly update older tutorials to comply new API\@. To make this better I'd need to consider planning ahead more, so that the newest API is flexible enough to support new features on the roadmap and not get distracted as much by other features emerging on the horizon. -An other thing is that I often feel like that there are some things in using TFW +Another thing is that I often feel like that there are some things in using TFW that could be made a lot easier. As a maintainer sometimes I find it hard to tell what these things exactly are, as I know the framework inside out, having written most of the codebase myself. I'd like to set some time aside to create tutorials using the framework myself, -so I can better narrow these potential difficulities down. +so I can better narrow these potential difficulties down. This would require me to be able to take things slow for a few weeks, as this is not something that is possible to do effectively in a rush. In the summer months, maybe? @@ -81,12 +81,12 @@ as I just simply enjoy admiring quality typography which WYSIWYG% I've spent a long time working on and maintaining the Tutorial Framework. While the list of technical things I've learned is long and exciting, I also feel like I've learned a lot about supporting other developers, project management and communication. -An other thing that I've been able to learn is to adopt a more patient mindset while +Another thing that I've been able to learn is to adopt a more patient mindset while working. Back in the day I used to be nervous because of deadlines and things not working how they were supposed to, but now I know that these things are a part of the job and one must be able to deal with them without getting agitated. Any time I feel like something is not OK, I just try take a step back, relax a bit to -blow of steam and approach the issue without acting in haste. +blow off steam and approach the issue without acting in haste. I think this is not too related to working as a software engineer, but something that can be applied to anything we do. diff --git a/content/using_the_framework.tex b/content/using_the_framework.tex index 247da5e..5280b1c 100644 --- a/content/using_the_framework.tex +++ b/content/using_the_framework.tex @@ -12,10 +12,10 @@ The main points include: for this machine, such as ones that display messages to the user \item Implementing the required event handlers, which may trigger state transitions in the FSM, interact with non-TFW code and do various things that might be needed during an exercise, - such as compiling code written by the user or runnign unit tests + such as compiling code written by the user or running unit tests \item Defining what processes should run inside the container besides the things TFW starts automatically - \item Setting up reverse proxying for any user-facing network application such as webservers + \item Setting up reverse proxying for any user-facing network application such as web servers \end{itemize} At first all these tasks can seem quite overwhelming. Remember that \emph{witchcraft} is what we practice here after all. @@ -58,9 +58,9 @@ understanding of how the framework interacts with client code. The \code{config.yml} file is an Avatao challenge configuration file, which is used describe what kind of Docker containers implement a challenge, what ports do they expose talking what protocols, define the name of the -excercise, it's difficulity, and so on. +exercise, it's difficulty, and so on. Every Avatao challenge must provide such a file. -An other thing that is not even indicated on the structure above is the \code{metadata} +Another thing that is not even indicated on the structure above is the \code{metadata} directory, which contains the short and long descriptions of challenges in Markdown format. @@ -131,8 +131,8 @@ or kubernetes% to orchestrate their containers). This approach is not suitable for TFW, as it would require the framework to orchestrate Docker containers from inside a container managed by the same Docker daemon, which is -feasible in theory but very hard and unservicable to do in practice. -This would require doing something like mounting the unix domain socket used +feasible in theory but very hard and unserviceable to do in practice. +This would require doing something like mounting the UNIX domain socket used to manage the Docker daemon inside a running container managed by that daemon, which is a fun thing to play around with in my free time but not something suitable for running in production, @@ -146,7 +146,7 @@ process started, and who gets PID 1 traditionally.} 1, which in turn starts all programs defined in the \code{solvable/supervisor} directory. Content creators can use supervisor configuration files to define these programs. For example, a developer would write a file similar to this one and place it into the -\code{solvable/supervisor} directory to run a webserver written in Python: +\code{solvable/supervisor} directory to run a web server written in Python: \begin{lstlisting} [program:yourprogram] user=user @@ -155,7 +155,7 @@ command=python3 server.py autostart=true \end{lstlisting} As mentioned earlier in~\ref{processmanagement}, any program that is started this way -can be managed by the framewok using API messages. +can be managed by the framework using API messages. All this is possible through using the xmlrpc% \footnote{\href{https://docs.python.org/3/library/xmlrpc.html} {https://docs.python.org/3/library/xmlrpc.html}} @@ -170,44 +170,44 @@ invoke it's command line utility in a separate process when you need something d For simplicity, exercises based on the framework only expose a single port from the \code{solvable} container. This port is required to serve the frontend of the framework. -If this is the case, how do we run additional web applications to showcase vulnerabilies +If this is the case, how do we run additional web applications to showcase vulnerabilities on during a tutorial? Since one port can only be bound by one process at a time, we will need to run a reverse-proxy% \footnote{\href{https://www.nginx.com/resources/glossary/reverse-proxy-server/} {https://www.nginx.com/resources/glossary/reverse-proxy-server/}} server inside the container to -bind the exposed port and redirect traffic to other webservers binding non-exposed ports. +bind the exposed port and redirect traffic to other web servers binding non-exposed ports. -To support this, TFW automatically starts an nginx webserver. It uses this nginx +To support this, TFW automatically starts an nginx web server. It uses this nginx instance to serve the framework frontend as well. It is possible to supply additional configurations to this server in a convenient manner: any configuration files placed into the \code{solvable/nginx} directory will be interpreted by nginx once the container has started. -To set up the reverse-proxying of a webserver running on port 3333, +To set up the reverse-proxying of a web server running on port 3333, one would write a configuration file similar to this one: \begin{lstlisting} location /yoururl { proxy_pass http://127.0.0.1:3333; } \end{lstlisting} -Now the content served by this websever on port 3333 -will be available on the url \code{/yoururl} despite that port 3333 +Now the content served by this web server on port 3333 +will be available on the URL \code{/yoururl} despite that port 3333 does not accept connections from outside the container as it is not exposed. It is very important to understand, that developers have to make sure that their web application \emph{behaves well} behind a reverse proxy. What this means is that they are going to be served from a ``subdirectory'' of the top level URL\@: for example \code{/register} will be served under \code{/yoururl/register}. -This means that all links in the final HTML must refer to the proxied urls, e.g.\ -\code{/yoururl/login}, and server side redirects must point to these correct hrefs as well. +This means that all links in the final HTML must refer to the proxied URLs, e.g.\ +\code{/yoururl/login}, and server-side redirects must point to these correct hrefs as well. Idiomatically this is usually implemented by supplying a \code{BASEURL} to the application through an environment variable, so that it is able to set itself up correctly. \subsection{Copying Configuration Files} Behind the curtains, the Tutorial Framework uses some Dockerfile trickery to -faciliate the copying of supervisor and nginx configuration files to their correct +facilitate the copying of supervisor and nginx configuration files to their correct locations. Normally when one uses the \code{COPY}% \footnote{\href{https://docs.docker.com/engine/reference/builder/\#copy} @@ -251,7 +251,7 @@ framework-specific directories. The use of this directory is not mandatory, only a good practice, as developers are free to implement the non-TFW parts of their exercises as they see fit (the copying of these files into image layers using \code{solvable/Dockerfile} -is their resposibility as well). +is their responsibility as well). \section{Configuring Built-in Components} @@ -268,11 +268,11 @@ initialized in, which exposes several communicative options through the \code{__init__()} methods of these event handlers: \lstinputlisting[ language=python, - caption={Example of inicializing some event handlers}, + caption={Example of initializing some event handlers}, captionpos=b ]{listings/event_handler_main.py} -\section{Impelenting a Finite State Machine} +\section{Implementing a Finite State Machine} The Tutorial Framework allows developers to define state machines in two ways, as discussed before. @@ -281,7 +281,7 @@ to showcase the capabilities of the framework. \subsection{YAML based FSM} YAML\footnote{YAML Ain't Markup Language: \href{http://yaml.org}{http://yaml.org}} -is a human friendly data serialization standard and a superset of JSON. +is a human friendly data serialization standard and a superset of JSON\@. It is possible to use this format to define a state machine like so: \lstinputlisting[ caption={A Finite State Machine implemented in YAML}, @@ -291,7 +291,7 @@ This state machine has two states, state \code{0} and \code{1}. It defines a single transition between them, \code{step_1}. On entering state \code{1} the FSM will write a message to the frontend messaging component by invoking TFW library code using Python. -The transition can only occour if the file \code{allow_step_1} exists. +The transition can only occur if the file \code{allow_step_1} exists. YAML based state machine implementations also allow the usage of the Jinja2% \footnote{\href{http://jinja.pocoo.org/docs/2.10/}{http://jinja.pocoo.org/docs/2.10/}} @@ -338,7 +338,7 @@ I am going to use the Python programming language, but it isn't hard to create event handlers in other languages, as the only thing they have to be capable of is communicating with the TFW server using ZeroMQ sockets, as previously discussed. -The library provided by the framework abstracts low level socket logic +The library provided by the framework abstracts low-level socket logic away by implementing easy to use base classes. Creating such base classes in a given language shouldn't take longer than a few hours for an experienced developer. @@ -346,7 +346,7 @@ Our challenge creators have already implemented similar libraries for Java, JavaScript and C++ as well. \lstinputlisting[ language=python, - caption={A very simple event hander implemented in Python}, + caption={A very simple event handler implemented in Python}, captionpos=b ]{listings/event_handler_example.py} This simple event handler subscribes to the \code{fsm_update} messages, @@ -359,7 +359,7 @@ abstract method, which is used to, well, handle events. \section{Setting Up a Developer Environment}\label{devenv} To make getting started as smooth as possible I have created -a ``bootstrap'' script which is capable of creating a development envrionment from +a ``bootstrap'' script which is capable of creating a development environment from scratch. This script is distributed as the following bash one-liner: \begin{lstlisting}[language=bash] @@ -370,7 +370,7 @@ This command downloads the script using \code{curl}% In the open source community it is quite common to distribute installers this way% \footnote{A good example of this is oh-my-zsh: \href{https://github.com/robbyrussell/oh-my-zsh}{https://github.com/robbyrussell/oh-my-zsh}}, -which might seem a little scary at first, but is not less safe then +which might seem a little scary at first, but is not less safe than downloading and executing a binary installer from a website with a valid TLS certificate, as \code{curl} will fail with an error message if the certificate is invalid. This is because both methods place their trust in the PKI~\footnote{Public Key Infrastructure} @@ -400,13 +400,13 @@ mitigating MITM attacks. The bootstrap script clones the three TFW repositories and does several steps to create a working environment into a single directory, that is based on -test-tutorail-framework: +test-tutorial-framework: \begin{itemize} \item It builds the newest version of the TFW baseimage locally \item It pins the version tag of this image in \code{solvable/Dockerfile}, so that this newly-built version will be used by the tutorial \item It places the latest frontend in \code{solvable/frontend} with - depencendies installed + dependencies installed \end{itemize} It is important to note that this script \emph{does not} install anything system-wide, it only works in the directory it is being executed from. @@ -456,7 +456,7 @@ to remove all the dependencies that won't be used when running the application% \footnote{Otherwise it won't be possible to serve these applications efficiently over the internet.}. The problem is, that these things can take a \emph{really} long time. -This is why today frontend builds usually take a lot longer than building anything +This is why today frontend builds usually take a lot longer then building anything not involving JavaScript (such as C++, C\# or any other compiled programming language). This mess presents it's own challenges for the Tutorial Framework as well. @@ -475,7 +475,7 @@ To circumvent this, it is possible to entirely exclude the Angular frontend from build, using build time arguments% \footnote{In practice this is done by supplying the option \code{--build-arg NOFRONTEND=1} to Docker.}. -But when doing so, developers would have to run the frondent locally with +But when doing so, developers would have to run the frontend locally with the whole \code{node_modules} directory present. The bootstrap script takes care of putting these dependencies there, while the \code{tfw.sh} script is capable of starting a development server @@ -483,9 +483,9 @@ to serve the frontend locally using \code{ng serve} besides starting the Docker container without the frontend. If this whole thing wasn't complicated enough, since Docker binds the port the container is going to use, \code{tfw.sh} has to run the Angular dev server on -an other port, then use the proxying features of Angular-CLI to forward requests -from this port to the runnign Docker container when requesting resources -other then the entrypoint to the Angular application. +another port, then use the proxying features of Angular-CLI to forward requests +from this port to the running Docker container when requesting resources +other than the entrypoint to the Angular application. This is the reason why the frontend is accessible through port \code{4200} (default port for \code{ng serve}) when using \code{tfw.sh} to start a tutorial, but when running @@ -535,7 +535,7 @@ of cat, for the additional fun factor, and because we love cats. The part after that is a timestamp of the day the release was made on. I only change major versions when I introduce backwards incompatible changes in the API of the framework, this way developers can trust that releases -with the same majors are compatible with eachother in regards to client code. +with the same majors are compatible with each other in regards to client code. The \code{master} branches of the frontend-TFW and test-TFW repositories are always kept compatible with the newest release tag of the baseimage.