Continue writig thesis with focus on arctitecture
This commit is contained in:
		| @@ -80,7 +80,42 @@ | ||||
|     title={Education as a key factor in the process of building cybersecurity}, | ||||
|     url={https://2017.cybersecforum.eu/files/2016/12/ecj_vol2_issue1_i.albrycht_education_as_a_key_in_the_process_of_building_cybersecurity.pdf}, | ||||
|     language={english}, | ||||
|     author={IZABELA ALBRYCHT}, | ||||
|     author={Izabela Albrycht}, | ||||
|     year={2016}, | ||||
| } | ||||
|  | ||||
| @online{EBayGit, | ||||
|     title={Pwning eBay - How I Dumped eBay Japan's Website Source Code}, | ||||
|     url={https://slashcrypto.org/2018/11/28/eBay-source-code-leak/}, | ||||
|     language={english}, | ||||
|     author={David Wind}, | ||||
|     year={2018}, | ||||
|     month=nov, | ||||
| } | ||||
|  | ||||
| @online{CloudFlareLeak, | ||||
|     title={Incident report on memory leak caused by Cloudflare parser bug}, | ||||
|     url={https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/}, | ||||
|     language={english}, | ||||
|     author={John Graham-Cumming}, | ||||
|     year={2017}, | ||||
|     month=feb, | ||||
| } | ||||
|  | ||||
| @online{NoPerfectSecurity, | ||||
|     title={The Illusion Of Perfect Cybersecurity}, | ||||
|     url={https://www.forbes.com/sites/forbestechcouncil/2018/03/27/the-illusion-of-perfect-cybersecurity/}, | ||||
|     language={english}, | ||||
|     author={George Finney}, | ||||
|     year={2018}, | ||||
|     month=mar, | ||||
| } | ||||
|  | ||||
| @online{JavaScript, | ||||
|     title={JavaScript is a Dysfunctional Programming Language}, | ||||
|     url={https://medium.com/javascript-non-grata/javascript-is-a-dysfunctional-programming-language-a1f4866e186f}, | ||||
|     language={english}, | ||||
|     author={Richard Kenneth Eng}, | ||||
|     year={2016}, | ||||
|     month=mar, | ||||
| } | ||||
|   | ||||
							
								
								
									
										157
									
								
								content/architecture.tex
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										157
									
								
								content/architecture.tex
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,157 @@ | ||||
| \chapter{Framework Architecture} | ||||
| \section{Core Technology} | ||||
|  | ||||
| It is important to understand that the Tutorial Framework is currently implemented as | ||||
| two Docker images: | ||||
| \begin{itemize} | ||||
|     \item the \texttt{solvable} image is responsible for running the framework and the client | ||||
|           code depending on it | ||||
|     \item the \texttt{controller} image is responsible for solution checking (to figure out | ||||
|           whether the user completed the tutorial or not) | ||||
| \end{itemize} | ||||
| During most of this capter I am going to be discussing the \texttt{solvable} Docker image, | ||||
| with the exception of section \ref{solutioncheck}, where I will dive into how the | ||||
| \texttt{controller} image is implemented. | ||||
|  | ||||
| The most important feature of the framework is it's messaging system. | ||||
| Basically what we need is a system where processes running inside a Docker container | ||||
| would be allowed to communicate with eachother.  | ||||
| This is easy with lots of possible solutions (named pipes, sockets or shared memory to name a few). | ||||
| The hard part is that frontend components running inside a web browser -- which could be | ||||
| potentially on the other side of the planet -- would also need to partake in said communication. | ||||
| So what we need to create is something of a hybrid between an IPC system and something | ||||
| that can communicate with JavaScript running in a browser connected to it. | ||||
| The solution the framework uses is a proxy server, which connects to frontend components | ||||
| on one side and handles interprocess communication on the other side.  | ||||
| This way the server is capable of proxying messages between the two sides, enabling | ||||
| communitaion between them. | ||||
| Notice that this way what we have is essentially an IPC system in which a web application | ||||
| can ``act like'' it was running on the backend in a sense: it is easily able to | ||||
| communicate with processes on the backend, while in reality the web application | ||||
| runs in the browser of the user, on a completely different machine. | ||||
|  | ||||
| \begin{note} | ||||
| The core idea and initial implementation of this server comes from Bálint Bokros, | ||||
| which was later redesigned and fully rewritten by me to allow for greater flexibility | ||||
| (such as connecting to more than a single browser at a time, different messaging modes, | ||||
| message authentication, restoration of frontend state, a complete overhaul of the | ||||
| state tracking system and the possibility for solution checking among other things). | ||||
| If you are explicitly interested in the differences between the original POC implementation | ||||
| (which is out of scope for this thesis due to lenght constraints) and the current | ||||
| framework please consult Bálint's excellent paper and Bachelor's Thesis on it\cite{BokaThesis}. | ||||
| \end{note} | ||||
|  | ||||
| Now let us take a closer look: | ||||
|  | ||||
| \subsection{Connecting to the Frontend} | ||||
|  | ||||
| The old way of creating dynamic webpages was AJAX polling, which is basically sending | ||||
| HTTP requests to a server at regular intervals from JavaScript to update the contents | ||||
| of your website (and as such requiring to go over the whole TCP handshake and the | ||||
| HTTP request-response on each update). | ||||
| This has been superseded by WebSockets around 2011, which provide a full-duplex | ||||
| communication channel over TCP between your browser and the server. | ||||
| This is done by initiation a protocol handshake using the \texttt{Connection: Upgrade} | ||||
| HTTP header, which establishes a premanent socket connection between the browser | ||||
| and the server. | ||||
| This allows for communication with lower overhead and latency facilitating efficient | ||||
| real-time applications. | ||||
|  | ||||
| The Tutorial Framework uses WebSockets to connect to it's web frontend. | ||||
| The framework proxy server is capable to connecting to an arbirary number of websockets, | ||||
| which allows opening different components in separate browser windows and tabs, or even | ||||
| in different browsers at once (such as opening a terminal in Chrome and an IDE in Firefox). | ||||
|  | ||||
| \subsection{Interprocess Communication} | ||||
|  | ||||
| To handle communication with processes running inside the container TFW utilizes | ||||
| the asynchronous distributed messaging library ZeroMQ% | ||||
| \footnote{\href{http://zeromq.org}{http://zeromq.org}} or ZMQ as short. | ||||
| The rationale behind this is that unlike other messaging systems such as | ||||
| RabbitMQ% | ||||
| \footnote{\href{https://www.rabbitmq.com}{https://www.rabbitmq.com}} or Redis% | ||||
| \footnote{\href{https://redis.io}{https://redis.io}}, | ||||
| ZMQ does not require a daemon (message broker process) and as such | ||||
| has a much lower memory footprint while still providing various messaging | ||||
| patterns and bindings for almost any widely used programming language. | ||||
| An other -- yet untilized -- capability of this solution is that since ZMQ is capable | ||||
| of using simple TCP sockets, we could even communicate with processes running on remote | ||||
| hosts using the framework. | ||||
|  | ||||
| There are various lower level and higher level alternatives for IPC other than | ||||
| ZMQ which were also considered during the desing process of the framework at some point. | ||||
| A few examples of top contenders and reasons for not using them in the end: | ||||
| \begin{itemize} | ||||
|     \item The handling of raw TCP sockets would involve lot's of boilerplate logic that | ||||
|     already have quality implementations in messaging libraries: i.e. making sure that | ||||
|     all bytes are sent or received both require checking the return values of the | ||||
|     libc \texttt{send()} and \texttt{recv()} system calls, while ZMQ takes care of this | ||||
|     extra logic involved and even provides higher level messaging patterns such as | ||||
|     subscribe-publish, which would need to be implemented on top of raw sockets again. | ||||
|     \item Using something like gRPC% | ||||
|     \footnote{\href{https://grpc.io}{https://grpc.io}} or plain HTTP (both of which | ||||
|     are considered to be higher level than ZMQ sockets) would require  | ||||
|     all processes partaking in the communication to be HTTP servers themselves, | ||||
|     which would make the framework | ||||
|     less lightweight and flexible: socket communication with or without ZMQ does not | ||||
|     force you to write synchronous or asynchronous code, whereas common HTTP servers | ||||
|     are either async or pre-fork in nature, which extort certain design choices on code | ||||
|     built on them. | ||||
| \end{itemize} | ||||
|  | ||||
| \section{High Level Overview} | ||||
|  | ||||
| Now being familiar with the technological basis of the framework we can now | ||||
| discuss it in more detail. | ||||
|  | ||||
| \pic{figures/tfw_architecture.png}{An overwiew of the Tutorial Framework} | ||||
|  | ||||
| Architecturally TFW consists of four main components: | ||||
| \begin{itemize} | ||||
|     \item \textbf{Event handlers}: processes running in a Docker container | ||||
|     \item \textbf{Frontend}: web application running in the browser of the user | ||||
|     \item \textbf{TFW (proxy) server}: responsible for message routing/proxying | ||||
|           between the frontend and event handlers | ||||
|     \item \textbf{TFW FSM}: a finite state machine responsible for tracking user progress, | ||||
|           that is implemented as an event handler called \texttt{FSMManagingEventHandler} | ||||
| \end{itemize} | ||||
| Note that it is important to keep in mind that as I've mentioned previously, | ||||
| the TFW Server and event handlers reside in the \texttt{solvable} Docker container. | ||||
| They all run in separate processes and only communicate using ZeroMQ sockets. | ||||
|  | ||||
| In the following sections I am going to explain each of the main components in | ||||
| greater detail, as well as how they interact with each other, | ||||
| their respective responsibilities, | ||||
| some of the design choices behind them and more. | ||||
|  | ||||
| \subsection{Frontend} | ||||
|  | ||||
| This is a web application that runs in the browser of the user and uses | ||||
| multiple WebSocket connections to connect to the TFW server. | ||||
| Due to rapidly increasing complexity the original implementation (written in | ||||
| plain JavaScript with jQuery% | ||||
| \footnote{\href{https://jquery.com}{https://jquery.com}} and Bootstrap% | ||||
| \footnote{\href{https://getbootstrap.com}{https://getbootstrap.com}}) was becoming | ||||
| unmaintainable and the usage of some frontend framework became justified. | ||||
|  | ||||
| Several choices were considered, with the main contenders being: | ||||
| \begin{itemize} | ||||
|     \item Angular\footnote{\href{https://angular.io}{https://angular.io}} | ||||
|     \item React\footnote{\href{https://reactjs.org}{https://reactjs.org}} | ||||
|     \item Vue.js\footnote{\href{https://vuejs.org}{https://vuejs.org}} | ||||
| \end{itemize} | ||||
| After comparing the above frameworks we've decided to work with Angular for | ||||
| several reasons. | ||||
| One being that Angular is essentially a complete platform that is very well | ||||
| suitable for building complex architecture into a single page application. | ||||
| Other reasons included that the frontend of the Avatao platform is also written | ||||
| in Angular (bonus points for experienced team members in the company). | ||||
| An other good thing going for it is that Angular forces you to use TypeScript% | ||||
| \footnote{\href{https://www.typescriptlang.org}{https://www.typescriptlang.org}} | ||||
| which tries to remedy the issues\cite{JavaScript} | ||||
| with JavaScript by being a language that transpiles to JavaScript while | ||||
| strongly encouraging things like static typing or Object Oriented Principles. | ||||
|  | ||||
| \subsection{Messaging} | ||||
| \subsection{TFW Finite State Machine} | ||||
| \subsection{Solution Checking}\label{solutioncheck} | ||||
| @@ -21,9 +21,16 @@ a new age of digital wild west, which could involve us running around in vulnera | ||||
| driving cars\cite{SelfDriving} with power over life and death, while exposing all our | ||||
| sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}. | ||||
| What a time to be alive. | ||||
| Unless we want to disconnect all our devices from all networks and ban USB sticks, the best | ||||
| lines of defense are going to be people -- a new generation of \emph{security conscious} | ||||
| users and developers. | ||||
| It is important to express that IT security is something that is \emph{really hard} to | ||||
| get right. | ||||
| Even if right often only means better then your neighbour, as perfect security is an utopia | ||||
| that doesn't seem to exist\cite{NoPerfectSecurity}. | ||||
| Often when large and reputable companies in the industry such as | ||||
| CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times | ||||
| is when people start to grasp how difficult it actually is. | ||||
| This is why unless we want to disconnect all our devices from all networks and ban USB | ||||
| sticks, the best lines of defense are going to be people -- a new generation  | ||||
| of \emph{security conscious} users and developers. | ||||
|  | ||||
| Among many other things outside IT, this is only possible with education\cite{ITSecEdu}. | ||||
| We need to come up with engaging, addictive and fun ways to learn (and teach), so that | ||||
| @@ -35,10 +42,10 @@ The only thing we can hope and work for is to become better and better as time | ||||
| and generations pass. | ||||
| We \emph{must} do better, and education is the way forward. | ||||
|  | ||||
| The short term goal of this project -- and thesis -- is to provide a new angle | ||||
| in the education of software engineering, especially secure software engineering | ||||
| based on the aspirations above, with the long term goal of bringing something new | ||||
| to the table in the matter of IT education as a whole | ||||
| The short term goal of this project -- and the goal of this thesis -- is to provide | ||||
| a new angle in the education of software engineering, especially secure software | ||||
| engineering based on the aspirations above, with the long term goal of bringing | ||||
| something new to the table in the matter of IT education as a whole | ||||
| (not just developers, but users as well). | ||||
|  | ||||
| \section{A Short Introduction to Avatao} | ||||
| @@ -46,7 +53,7 @@ to the table in the matter of IT education as a whole | ||||
| The goal of Avatao as a company is to help software developers in building a \emph{culture} of | ||||
| security amongst themselves, with the vision that if the world is going to be taken over by | ||||
| software no matter what, that software might as well be \emph{secure software}. | ||||
| To achieve this goal we have been working on an online e-learning platform with hundreds\ | ||||
| To achieve this goal we have been working on an online e-learning platform with hundreds% | ||||
| \footnote{654 exercises as of today, to be exact} | ||||
| of hands-on learning exercises to help students and professionals | ||||
| master IT security, collaborating with | ||||
| @@ -69,6 +76,8 @@ added authenticity and relevance \cite{AkosFacebook}. | ||||
| Our challenges usually involve some sort of website acting as frontend for the vulnerable | ||||
| application, or require the user to connect using SSH. | ||||
|  | ||||
| \pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform} | ||||
|  | ||||
| The Avatao platform relies heavily on Docker containers to spawn challenges, | ||||
| which makes it extremely flexible in terms of what is possible to do when creating | ||||
| content. | ||||
| @@ -87,7 +96,7 @@ things like exercises involving the use of Docker or Windows based challenges. | ||||
| \section{Emergence} | ||||
|  | ||||
| While working as a content creator I have stumbled into the idea of automating the completion | ||||
| of challenges for QA\footnote{Quality Assurrance} and demo purposes\ | ||||
| of challenges for QA\footnote{Quality Assurrance} and demo purposes% | ||||
| \footnote{I used to record short videos or GIFs to showcase my content to management}. | ||||
| In a certain scenario I was required to integrate a web based terminal emulator in a | ||||
| frontend application to improve user experience by making it possible to use a shell | ||||
| @@ -96,18 +105,19 @@ After I got this working I was looking into writing hacky bash scripts to automa | ||||
| required to complete the challenge in order to make it easier for me to record the solution, | ||||
| as I have often found myself recording over and over again for a demo without any mistakes. | ||||
| During the time I was playing around with this idea, researching possible solutions have led me | ||||
| to a hidden gem of a project on GitHub called \texttt{demo-magic}\ | ||||
| to a hidden gem of a project on GitHub called \texttt{demo-magic}% | ||||
| \footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}}, | ||||
| which is esentially a bash script that simulates someone typing into a terminal and executing | ||||
| commands. | ||||
| I have created a fork\ | ||||
| \footnote{The source code is available at | ||||
| \href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}} | ||||
| I have created a fork% | ||||
| \footnote{ | ||||
| \href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh} | ||||
| {https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}} | ||||
| of the project and integrated it into my challenge. | ||||
| Soon after recording demo videos was not even necessary anymore, as I have started to distribute | ||||
| the solution script with the challenge code itself, making it toggleable using build-time | ||||
| variables. | ||||
| Should the solution script be enabled, the challenge would automatically start\ | ||||
| Should the solution script be enabled, the challenge would automatically start% | ||||
| \footnote{I did this by injecting the solution script into the user's \texttt{.bashrc} file} | ||||
| completing itself in the terminal integrated into it's frontend, often even explaining the | ||||
| commands executed during the solution process. | ||||
| @@ -123,7 +133,7 @@ but what I did not know was that I have accidentally | ||||
| did something far more than a hacky bash script solving challenges, as this little script | ||||
| would help formulate the idea of the project \emph{Tutorial Framework} or just \emph{TFW}. | ||||
|  | ||||
| \section{Introducing the Tutorial Framework} | ||||
| \section{Vision of the Tutorial Framework} | ||||
|  | ||||
| The whole ''challenges that solve themselves'' thing seemed like an idea that has great | ||||
| potential if developed further. | ||||
| @@ -141,7 +151,7 @@ your newfound skills in action immediately. | ||||
|  | ||||
| For example a chatbot would show you how to encrypt a file using GnuGP, | ||||
| then it would ask you to encrypt an other file similarly. | ||||
| After this the bot could show you how to a configure a database server and then | ||||
| After this the bot could teach you how to a configure a database server and then | ||||
| ask you to write a configuration file yourself and then encrypt it because it might | ||||
| contain sensitive data such as open ports, usernames and such. | ||||
|  | ||||
| @@ -157,6 +167,28 @@ a web based frontend with a file editor, terminal, chat window and stuff like th | ||||
| Turns out that today all this can be done by writing a few hundred lines of Python | ||||
| code which uses the Tutorial Framework. | ||||
|  | ||||
| \subsection{Project Requirements}\label{requirements} | ||||
|  | ||||
| Based on this it is now more or less possible to define requirements for the project. | ||||
| The reason for the ``more or less'' part is that all of this is pretty much bleeding edge, | ||||
| where the requirements could shift dynamically with time. | ||||
| For this reason I am going to be as general as possible, to the point that some of | ||||
| this might even sound vauge. | ||||
| To achieve our goals we would need: | ||||
|  | ||||
| \begin{itemize} | ||||
|     \item a way to keep track of user progress | ||||
|     \item a way to to handle various events (i.e. we can react when | ||||
|           the user has edited a file, or has executed a command in the terminal) | ||||
|     \item a highly flexible messaging system, in which processes and | ||||
|           frontend components (running in a web browser) could communicate with eachother | ||||
|     \item a web based frontend with lots of built-in options (terminal, file editor, chat | ||||
|           window, etc.) that use said messaging system | ||||
|     \item stable APIs that can be exposed to content creators to work with (so that | ||||
|           framework updates won't break client code) | ||||
|     \item tooling for development (distributing, building and running) | ||||
| \end{itemize} | ||||
|  | ||||
| \section{Early Development} | ||||
|  | ||||
| Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company | ||||
| @@ -174,9 +206,27 @@ Bachelor's Thesis\cite{BokaThesis}. | ||||
| Although not much of the original code base has remained due to intense refactoring | ||||
| and all around changes, the result would serve as a solid foundation for further development, | ||||
| and the architecture is mostly the same to this day. | ||||
| The resulting code would be the first working POC\ | ||||
| The resulting code would be the first working POC% | ||||
| \footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection | ||||
| attack. | ||||
| This initial version included the foundations of the framework: | ||||
| a working messaging system, event handling and state tracking. | ||||
| These provided a great basis | ||||
| despite of the fact that the core codebase of the framework was almost | ||||
| completely rewritten due to an increased focus on code quality,  | ||||
| extensibility and API stability required by new features. | ||||
|  | ||||
| It is interesting to note, that when I've mentioned that the project requirements | ||||
| were kept general on purpose (\ref{requirements}) I had good reason to do so. | ||||
| When taking a look at the requirements of Bálint's Thesis, much of that | ||||
| is completely obsolete by now. | ||||
| But since the project has followed Agile Methodology% | ||||
| \footnote{Manifesto for Agile Software Development: | ||||
| \href{https://agilemanifesto.org}{https://agilemanifesto.org}} | ||||
| from the start, we were able to adapt to these changes without losing | ||||
| the progess he made in said Thesis. Quoting from the Agile Manifesto:  | ||||
| ``Responding to change over following a plan''. | ||||
| This is a really important takeaway. | ||||
|  | ||||
| After becoming a full time employee at Avatao I was tasked with developing the project | ||||
| with Bálint, who was later reassigned to work on the GDPR compliance of the platform. | ||||
|   | ||||
							
								
								
									
										
											BIN
										
									
								
								figures/avatao_challenge.png
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								figures/avatao_challenge.png
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 79 KiB | 
							
								
								
									
										
											BIN
										
									
								
								figures/tfw_architecture.png
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								figures/tfw_architecture.png
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 46 KiB | 
| @@ -10,7 +10,8 @@ | ||||
|     sectsty, | ||||
|     xcolor, | ||||
|     microtype, | ||||
|     tabto | ||||
|     tabto, | ||||
|     amsthm | ||||
| } | ||||
| \RequirePackage[bottom,hang,flushmargin]{footmisc} | ||||
|  | ||||
| @@ -18,6 +19,8 @@ | ||||
| \sethlcolor{andigray} | ||||
| \newcommand{\code}[1]{\hl{\mbox{#1}}} | ||||
|  | ||||
| \newtheorem*{note}{Note} | ||||
|  | ||||
| \newcommand{\pic}[3][width=\textwidth] | ||||
| { | ||||
| \begin{figure}[H] | ||||
|   | ||||
| @@ -41,7 +41,9 @@ | ||||
| \include{content/declaration} | ||||
| \include{content/abstract} | ||||
| \include{content/introduction} | ||||
| \include{content/architecture} | ||||
|  | ||||
| \listoffigures | ||||
| \lstlistoflistings | ||||
|  | ||||
| \renewcommand\bibname{References} | ||||
|   | ||||
		Reference in New Issue
	
	Block a user