diff --git a/bibliography.bib b/bibliography.bib index 51feecd..de5daee 100644 --- a/bibliography.bib +++ b/bibliography.bib @@ -80,7 +80,42 @@ title={Education as a key factor in the process of building cybersecurity}, url={https://2017.cybersecforum.eu/files/2016/12/ecj_vol2_issue1_i.albrycht_education_as_a_key_in_the_process_of_building_cybersecurity.pdf}, language={english}, - author={IZABELA ALBRYCHT}, + author={Izabela Albrycht}, year={2016}, } +@online{EBayGit, + title={Pwning eBay - How I Dumped eBay Japan's Website Source Code}, + url={https://slashcrypto.org/2018/11/28/eBay-source-code-leak/}, + language={english}, + author={David Wind}, + year={2018}, + month=nov, +} + +@online{CloudFlareLeak, + title={Incident report on memory leak caused by Cloudflare parser bug}, + url={https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/}, + language={english}, + author={John Graham-Cumming}, + year={2017}, + month=feb, +} + +@online{NoPerfectSecurity, + title={The Illusion Of Perfect Cybersecurity}, + url={https://www.forbes.com/sites/forbestechcouncil/2018/03/27/the-illusion-of-perfect-cybersecurity/}, + language={english}, + author={George Finney}, + year={2018}, + month=mar, +} + +@online{JavaScript, + title={JavaScript is a Dysfunctional Programming Language}, + url={https://medium.com/javascript-non-grata/javascript-is-a-dysfunctional-programming-language-a1f4866e186f}, + language={english}, + author={Richard Kenneth Eng}, + year={2016}, + month=mar, +} diff --git a/content/architecture.tex b/content/architecture.tex new file mode 100644 index 0000000..dbd7bf9 --- /dev/null +++ b/content/architecture.tex @@ -0,0 +1,157 @@ +\chapter{Framework Architecture} +\section{Core Technology} + +It is important to understand that the Tutorial Framework is currently implemented as +two Docker images: +\begin{itemize} + \item the \texttt{solvable} image is responsible for running the framework and the client + code depending on it + \item the \texttt{controller} image is responsible for solution checking (to figure out + whether the user completed the tutorial or not) +\end{itemize} +During most of this capter I am going to be discussing the \texttt{solvable} Docker image, +with the exception of section \ref{solutioncheck}, where I will dive into how the +\texttt{controller} image is implemented. + +The most important feature of the framework is it's messaging system. +Basically what we need is a system where processes running inside a Docker container +would be allowed to communicate with eachother. +This is easy with lots of possible solutions (named pipes, sockets or shared memory to name a few). +The hard part is that frontend components running inside a web browser -- which could be +potentially on the other side of the planet -- would also need to partake in said communication. +So what we need to create is something of a hybrid between an IPC system and something +that can communicate with JavaScript running in a browser connected to it. +The solution the framework uses is a proxy server, which connects to frontend components +on one side and handles interprocess communication on the other side. +This way the server is capable of proxying messages between the two sides, enabling +communitaion between them. +Notice that this way what we have is essentially an IPC system in which a web application +can ``act like'' it was running on the backend in a sense: it is easily able to +communicate with processes on the backend, while in reality the web application +runs in the browser of the user, on a completely different machine. + +\begin{note} +The core idea and initial implementation of this server comes from Bálint Bokros, +which was later redesigned and fully rewritten by me to allow for greater flexibility +(such as connecting to more than a single browser at a time, different messaging modes, +message authentication, restoration of frontend state, a complete overhaul of the +state tracking system and the possibility for solution checking among other things). +If you are explicitly interested in the differences between the original POC implementation +(which is out of scope for this thesis due to lenght constraints) and the current +framework please consult Bálint's excellent paper and Bachelor's Thesis on it\cite{BokaThesis}. +\end{note} + +Now let us take a closer look: + +\subsection{Connecting to the Frontend} + +The old way of creating dynamic webpages was AJAX polling, which is basically sending +HTTP requests to a server at regular intervals from JavaScript to update the contents +of your website (and as such requiring to go over the whole TCP handshake and the +HTTP request-response on each update). +This has been superseded by WebSockets around 2011, which provide a full-duplex +communication channel over TCP between your browser and the server. +This is done by initiation a protocol handshake using the \texttt{Connection: Upgrade} +HTTP header, which establishes a premanent socket connection between the browser +and the server. +This allows for communication with lower overhead and latency facilitating efficient +real-time applications. + +The Tutorial Framework uses WebSockets to connect to it's web frontend. +The framework proxy server is capable to connecting to an arbirary number of websockets, +which allows opening different components in separate browser windows and tabs, or even +in different browsers at once (such as opening a terminal in Chrome and an IDE in Firefox). + +\subsection{Interprocess Communication} + +To handle communication with processes running inside the container TFW utilizes +the asynchronous distributed messaging library ZeroMQ% +\footnote{\href{http://zeromq.org}{http://zeromq.org}} or ZMQ as short. +The rationale behind this is that unlike other messaging systems such as +RabbitMQ% +\footnote{\href{https://www.rabbitmq.com}{https://www.rabbitmq.com}} or Redis% +\footnote{\href{https://redis.io}{https://redis.io}}, +ZMQ does not require a daemon (message broker process) and as such +has a much lower memory footprint while still providing various messaging +patterns and bindings for almost any widely used programming language. +An other -- yet untilized -- capability of this solution is that since ZMQ is capable +of using simple TCP sockets, we could even communicate with processes running on remote +hosts using the framework. + +There are various lower level and higher level alternatives for IPC other than +ZMQ which were also considered during the desing process of the framework at some point. +A few examples of top contenders and reasons for not using them in the end: +\begin{itemize} + \item The handling of raw TCP sockets would involve lot's of boilerplate logic that + already have quality implementations in messaging libraries: i.e. making sure that + all bytes are sent or received both require checking the return values of the + libc \texttt{send()} and \texttt{recv()} system calls, while ZMQ takes care of this + extra logic involved and even provides higher level messaging patterns such as + subscribe-publish, which would need to be implemented on top of raw sockets again. + \item Using something like gRPC% + \footnote{\href{https://grpc.io}{https://grpc.io}} or plain HTTP (both of which + are considered to be higher level than ZMQ sockets) would require + all processes partaking in the communication to be HTTP servers themselves, + which would make the framework + less lightweight and flexible: socket communication with or without ZMQ does not + force you to write synchronous or asynchronous code, whereas common HTTP servers + are either async or pre-fork in nature, which extort certain design choices on code + built on them. +\end{itemize} + +\section{High Level Overview} + +Now being familiar with the technological basis of the framework we can now +discuss it in more detail. + +\pic{figures/tfw_architecture.png}{An overwiew of the Tutorial Framework} + +Architecturally TFW consists of four main components: +\begin{itemize} + \item \textbf{Event handlers}: processes running in a Docker container + \item \textbf{Frontend}: web application running in the browser of the user + \item \textbf{TFW (proxy) server}: responsible for message routing/proxying + between the frontend and event handlers + \item \textbf{TFW FSM}: a finite state machine responsible for tracking user progress, + that is implemented as an event handler called \texttt{FSMManagingEventHandler} +\end{itemize} +Note that it is important to keep in mind that as I've mentioned previously, +the TFW Server and event handlers reside in the \texttt{solvable} Docker container. +They all run in separate processes and only communicate using ZeroMQ sockets. + +In the following sections I am going to explain each of the main components in +greater detail, as well as how they interact with each other, +their respective responsibilities, +some of the design choices behind them and more. + +\subsection{Frontend} + +This is a web application that runs in the browser of the user and uses +multiple WebSocket connections to connect to the TFW server. +Due to rapidly increasing complexity the original implementation (written in +plain JavaScript with jQuery% +\footnote{\href{https://jquery.com}{https://jquery.com}} and Bootstrap% +\footnote{\href{https://getbootstrap.com}{https://getbootstrap.com}}) was becoming +unmaintainable and the usage of some frontend framework became justified. + +Several choices were considered, with the main contenders being: +\begin{itemize} + \item Angular\footnote{\href{https://angular.io}{https://angular.io}} + \item React\footnote{\href{https://reactjs.org}{https://reactjs.org}} + \item Vue.js\footnote{\href{https://vuejs.org}{https://vuejs.org}} +\end{itemize} +After comparing the above frameworks we've decided to work with Angular for +several reasons. +One being that Angular is essentially a complete platform that is very well +suitable for building complex architecture into a single page application. +Other reasons included that the frontend of the Avatao platform is also written +in Angular (bonus points for experienced team members in the company). +An other good thing going for it is that Angular forces you to use TypeScript% +\footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}} +which tries to remedy the issues\cite{JavaScript} +with JavaScript by being a language that transpiles to JavaScript while +strongly encouraging things like static typing or Object Oriented Principles. + +\subsection{Messaging} +\subsection{TFW Finite State Machine} +\subsection{Solution Checking}\label{solutioncheck} diff --git a/content/introduction.tex b/content/introduction.tex index 1c1e1b3..4140413 100644 --- a/content/introduction.tex +++ b/content/introduction.tex @@ -21,9 +21,16 @@ a new age of digital wild west, which could involve us running around in vulnera driving cars\cite{SelfDriving} with power over life and death, while exposing all our sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}. What a time to be alive. -Unless we want to disconnect all our devices from all networks and ban USB sticks, the best -lines of defense are going to be people -- a new generation of \emph{security conscious} -users and developers. +It is important to express that IT security is something that is \emph{really hard} to +get right. +Even if right often only means better then your neighbour, as perfect security is an utopia +that doesn't seem to exist\cite{NoPerfectSecurity}. +Often when large and reputable companies in the industry such as +CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times +is when people start to grasp how difficult it actually is. +This is why unless we want to disconnect all our devices from all networks and ban USB +sticks, the best lines of defense are going to be people -- a new generation +of \emph{security conscious} users and developers. Among many other things outside IT, this is only possible with education\cite{ITSecEdu}. We need to come up with engaging, addictive and fun ways to learn (and teach), so that @@ -35,10 +42,10 @@ The only thing we can hope and work for is to become better and better as time and generations pass. We \emph{must} do better, and education is the way forward. -The short term goal of this project -- and thesis -- is to provide a new angle -in the education of software engineering, especially secure software engineering -based on the aspirations above, with the long term goal of bringing something new -to the table in the matter of IT education as a whole +The short term goal of this project -- and the goal of this thesis -- is to provide +a new angle in the education of software engineering, especially secure software +engineering based on the aspirations above, with the long term goal of bringing +something new to the table in the matter of IT education as a whole (not just developers, but users as well). \section{A Short Introduction to Avatao} @@ -46,7 +53,7 @@ to the table in the matter of IT education as a whole The goal of Avatao as a company is to help software developers in building a \emph{culture} of security amongst themselves, with the vision that if the world is going to be taken over by software no matter what, that software might as well be \emph{secure software}. -To achieve this goal we have been working on an online e-learning platform with hundreds\ +To achieve this goal we have been working on an online e-learning platform with hundreds% \footnote{654 exercises as of today, to be exact} of hands-on learning exercises to help students and professionals master IT security, collaborating with @@ -69,6 +76,8 @@ added authenticity and relevance \cite{AkosFacebook}. Our challenges usually involve some sort of website acting as frontend for the vulnerable application, or require the user to connect using SSH. +\pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform} + The Avatao platform relies heavily on Docker containers to spawn challenges, which makes it extremely flexible in terms of what is possible to do when creating content. @@ -87,7 +96,7 @@ things like exercises involving the use of Docker or Windows based challenges. \section{Emergence} While working as a content creator I have stumbled into the idea of automating the completion -of challenges for QA\footnote{Quality Assurrance} and demo purposes\ +of challenges for QA\footnote{Quality Assurrance} and demo purposes% \footnote{I used to record short videos or GIFs to showcase my content to management}. In a certain scenario I was required to integrate a web based terminal emulator in a frontend application to improve user experience by making it possible to use a shell @@ -96,18 +105,19 @@ After I got this working I was looking into writing hacky bash scripts to automa required to complete the challenge in order to make it easier for me to record the solution, as I have often found myself recording over and over again for a demo without any mistakes. During the time I was playing around with this idea, researching possible solutions have led me -to a hidden gem of a project on GitHub called \texttt{demo-magic}\ +to a hidden gem of a project on GitHub called \texttt{demo-magic}% \footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}}, which is esentially a bash script that simulates someone typing into a terminal and executing commands. -I have created a fork\ -\footnote{The source code is available at -\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}} +I have created a fork% +\footnote{ +\href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh} +{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}} of the project and integrated it into my challenge. Soon after recording demo videos was not even necessary anymore, as I have started to distribute the solution script with the challenge code itself, making it toggleable using build-time variables. -Should the solution script be enabled, the challenge would automatically start\ +Should the solution script be enabled, the challenge would automatically start% \footnote{I did this by injecting the solution script into the user's \texttt{.bashrc} file} completing itself in the terminal integrated into it's frontend, often even explaining the commands executed during the solution process. @@ -123,7 +133,7 @@ but what I did not know was that I have accidentally did something far more than a hacky bash script solving challenges, as this little script would help formulate the idea of the project \emph{Tutorial Framework} or just \emph{TFW}. -\section{Introducing the Tutorial Framework} +\section{Vision of the Tutorial Framework} The whole ''challenges that solve themselves'' thing seemed like an idea that has great potential if developed further. @@ -141,7 +151,7 @@ your newfound skills in action immediately. For example a chatbot would show you how to encrypt a file using GnuGP, then it would ask you to encrypt an other file similarly. -After this the bot could show you how to a configure a database server and then +After this the bot could teach you how to a configure a database server and then ask you to write a configuration file yourself and then encrypt it because it might contain sensitive data such as open ports, usernames and such. @@ -157,6 +167,28 @@ a web based frontend with a file editor, terminal, chat window and stuff like th Turns out that today all this can be done by writing a few hundred lines of Python code which uses the Tutorial Framework. +\subsection{Project Requirements}\label{requirements} + +Based on this it is now more or less possible to define requirements for the project. +The reason for the ``more or less'' part is that all of this is pretty much bleeding edge, +where the requirements could shift dynamically with time. +For this reason I am going to be as general as possible, to the point that some of +this might even sound vauge. +To achieve our goals we would need: + +\begin{itemize} + \item a way to keep track of user progress + \item a way to to handle various events (i.e. we can react when + the user has edited a file, or has executed a command in the terminal) + \item a highly flexible messaging system, in which processes and + frontend components (running in a web browser) could communicate with eachother + \item a web based frontend with lots of built-in options (terminal, file editor, chat + window, etc.) that use said messaging system + \item stable APIs that can be exposed to content creators to work with (so that + framework updates won't break client code) + \item tooling for development (distributing, building and running) +\end{itemize} + \section{Early Development} Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company @@ -174,9 +206,27 @@ Bachelor's Thesis\cite{BokaThesis}. Although not much of the original code base has remained due to intense refactoring and all around changes, the result would serve as a solid foundation for further development, and the architecture is mostly the same to this day. -The resulting code would be the first working POC\ +The resulting code would be the first working POC% \footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection attack. +This initial version included the foundations of the framework: +a working messaging system, event handling and state tracking. +These provided a great basis +despite of the fact that the core codebase of the framework was almost +completely rewritten due to an increased focus on code quality, +extensibility and API stability required by new features. + +It is interesting to note, that when I've mentioned that the project requirements +were kept general on purpose (\ref{requirements}) I had good reason to do so. +When taking a look at the requirements of Bálint's Thesis, much of that +is completely obsolete by now. +But since the project has followed Agile Methodology% +\footnote{Manifesto for Agile Software Development: +\href{https://agilemanifesto.org}{https://agilemanifesto.org}} +from the start, we were able to adapt to these changes without losing +the progess he made in said Thesis. Quoting from the Agile Manifesto: +``Responding to change over following a plan''. +This is a really important takeaway. After becoming a full time employee at Avatao I was tasked with developing the project with Bálint, who was later reassigned to work on the GDPR compliance of the platform. diff --git a/figures/avatao_challenge.png b/figures/avatao_challenge.png new file mode 100644 index 0000000..e23897a Binary files /dev/null and b/figures/avatao_challenge.png differ diff --git a/figures/tfw_architecture.png b/figures/tfw_architecture.png new file mode 100644 index 0000000..854448f Binary files /dev/null and b/figures/tfw_architecture.png differ diff --git a/latexplate.sty b/latexplate.sty index 4a08cfc..63eb75e 100644 --- a/latexplate.sty +++ b/latexplate.sty @@ -10,7 +10,8 @@ sectsty, xcolor, microtype, - tabto + tabto, + amsthm } \RequirePackage[bottom,hang,flushmargin]{footmisc} @@ -18,6 +19,8 @@ \sethlcolor{andigray} \newcommand{\code}[1]{\hl{\mbox{#1}}} +\newtheorem*{note}{Note} + \newcommand{\pic}[3][width=\textwidth] { \begin{figure}[H] diff --git a/thesis.tex b/thesis.tex index 1a037b6..701605c 100644 --- a/thesis.tex +++ b/thesis.tex @@ -41,7 +41,9 @@ \include{content/declaration} \include{content/abstract} \include{content/introduction} +\include{content/architecture} +\listoffigures \lstlistoflistings \renewcommand\bibname{References}