\chapter{Introduction} \section{Project justification} As the world is being completely engulfed by software, the need for accessible, but high quality learning materials on software engineering and especially secure software engineering is on the rise. While we are enjoying the comfort that information technology provides us, we often forget about the risks involved in relying so much on software in our everyday lives. When taking a look on recent events, such as a cyber arms race taking place between leading powers, 50 million Facebook accounts being breached due to the incorrect handling of access tokens\cite{FacebookBreach}, or how China is building an Orwellian state of total digital surveillance% \cite{ChinaSurv}\cite{ChinaCredit}, it becomes clear that security and privacy in the IT sector is more important now than ever. With all of our data slowly crawling towards the cloud and an IoT revolution on our necks, we as an industry must face the music and start actually doing something before we enter a new age of digital wild west, which could involve us running around in vulnerable self driving cars\cite{SelfDriving} with power over life and death, while exposing all our sensitive data through our ill-protected smart phones\cite{Android} and IoT devices\cite{IoTDDoS}. What a time to be alive. It is important to express that IT security is something that is \emph{really hard} to get right. Even if right often only means better then your neighbour, as perfect security is an utopia that doesn't seem to exist\cite{NoPerfectSecurity}. Often when large and reputable companies in the industry such as CloudFlare\cite{CloudFlareLeak} or eBay\cite{EBayGit} can fail to get it right at times is when people start to grasp how difficult it actually is. This is why unless we want to disconnect all our devices from all networks and ban USB sticks, the best lines of defense are going to be people --- a new generation of \emph{security conscious} users and developers. Among many other things outside IT, this is only possible with education\cite{ITSecEdu}. We need to come up with engaging, addictive and fun ways to learn (and teach), so that more and more people will be motivated to do so and the drive to acquire and share knowledge is something that comes naturally, rather than something we have to struggle for. I believe that this is something that \emph{can} and \emph{should} be applied to everything we do as a society. The only thing we can hope and work for is to become better and better as time and generations pass. We \emph{must} do better, and education is the way forward. The short term goal of this project --- and the goal of this thesis --- is to provide a new angle in the education of software engineering, especially secure software engineering based on the aspirations above, with the long term goal of bringing something new to the table in the matter of IT education as a whole (not just developers, but users as well). \section{A Short Introduction to Avatao} The goal of Avatao as a company is to help software developers in building a \emph{culture} of security amongst themselves, with the vision that if the world is going to be taken over by software no matter what, that software might as well be \emph{secure software}. To achieve this goal we have been working on an online e-learning platform with hundreds% \footnote{654 exercises as of today, to be exact} of hands-on learning exercises to help students and professionals master IT security, collaborating with universities around the world and providing a solution for companies in building \emph{security consciousness} amongst their developer teams. Since starting out we have amassed some experience in building fun challenges that showcase the exploitation and fixing of relevant security vulnerabilites in code or configuration. Traditionally these exercises revolved around offensive and defensive tasks, with challenges often being split into two or more parts. For example users would have to hack a website by exploiting a buffer overflow vulnerability, then in the second challenge they would fix the code they've just exploited in a web based code editor. These kind of exercises offer great flexibility to reflect real world security issues, as in more complex challenges users might be required to exploit multiple vulnerabilites for success, and understand the ways they augment each other. We often recreate real world scenarios based on incident reports released by companies for added authenticity and relevance\cite{AkosFacebook}. Our challenges usually involve some sort of website acting as frontend for the vulnerable application, or require the user to connect using SSH\@. \pic{figures/avatao_challenge.png}{An offensive challenge on the Avatao platform} The Avatao platform relies heavily on Docker containers to spawn challenges, which makes it extremely flexible in terms of what is possible to do when creating content. Essentially anything that you can do inside a Docker conainer can be done on the Avatao platform as well. Currently each challenge is implemented as a set of Docker images residing inside a Git repository exclusive to the specific challenge in mind. Our content creation wokflow enables developers to create such repositories on GitHub, which are automatically set up with the proper webhooks, so that when their content gets reviewed (and their feature branches merged), their changes will go live on the platform as well. In the future we also plan on supporting the use of virtual machines to implement challenges, which could further increase this fexibility by addig the possiblity to do things like exercises involving the use of Docker or Windows based challenges. \section{Emergence}\label{intro:emergence} While working as a content creator I have stumbled into the idea of automating the completion of challenges for QA\footnote{Quality Assurrance} and demo purposes% \footnote{I used to record short videos or GIFs to showcase my content to management}. In a certain scenario I was required to integrate a web based terminal emulator in a frontend application to improve user experience by making it possible to use a shell right on the website rather than having to connect through SSH\@. After I got this working I was looking into writing hacky bash scripts to automate the steps required to complete the challenge in order to make it easier for me to record the solution, as I have often found myself recording over and over again for a demo without any mistakes. During the time I was playing around with this idea, researching possible solutions have led me to a hidden gem of a project on GitHub called \texttt{demo-magic}% \footnote{\href{https://github.com/paxtonhare/demo-magic}{https://github.com/paxtonhare/demo-magic}}, which is esentially a bash script that simulates someone typing into a terminal and executing commands. I have created a fork% \footnote{ \href{https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh} {https://git.strongds.hu/mrtoth/demo.sh/src/master/demo.sh}} of the project and integrated it into my challenge. Soon after recording demo videos was not even necessary anymore, as I have started to distribute the solution script with the challenge code itself, making it toggleable using build-time variables. Should the solution script be enabled, the challenge would automatically start% \footnote{I did this by injecting the solution script into the user's \texttt{.bashrc} file} completing itself in the terminal integrated into it's frontend, often even explaining the commands executed during the solution process. \lstinputlisting[ language=bash, caption={Example for a solution script}, captionpos=b ]{listings/demosh.example} I was quite pleased with myself, no longer having to do the busywork of recording videos, but what I did not know was that I have accidentally did something far more than a hacky bash script solving challenges, as this little script would help formulate the idea of the project \emph{Tutorial Framework} or just \emph{TFW}. \section{Vision of the Tutorial Framework} The whole ``challenges that solve themselves'' thing seemed like an idea that has great potential if developed further. We have envisioned something that resembles a learning video, but it is real, actual software running and interacting with itself to showcase different topics to the user. Something that would allow the users to stop at any given time, take a breath, interact with the environment on their own (i.e.\ take a look a the directory structure or a file, try what happens if a command is executed somewhat differently, etc.) and then continue on with the tutorial. We wanted to create something that would feel like if an actual teacher was standing next to you, explaining topics to you in your own pace, while showing you how to solve a related task. This teacher scenario would allow you to take the helm sometimes and try applying your newfound skills in action immediately. For example a chatbot would show you how to encrypt a file using GnuGP, then it would ask you to encrypt an other file similarly. After this the bot could teach you how to a configure a database server and then ask you to write a configuration file yourself and then encrypt it because it might contain sensitive data such as open ports, usernames and such. Technically this is far from trivial however: we would have to keep track of the user's progress at all times, be able to actually check if the user has successfully encrypted the file by decrypting it and then checking if the configuration file is valid or not (this would practically require trying to start a database server with it). After all this we would still have to offer \emph{relevant} and helpful assistance if something went wrong. Another scenario we've visioned was the following: Imagine a code editor on the right which contains the authentication logic of a website. On the left, imagine that the website which the code in the editor implements is present. Note that the website is completely real: it is an actual, functional web application users can interact with (i.e.\ navigate through the pages, register or log in). The code editor has a button titled ``Deploy'' on it. If the user changes the source code of the application and clicks this button, the application should restart itself with the new code. Let's say that the user comments out the part that authenticates a user. In this case the application should let anyone log in dummy credentials. Meanwhile a console could show the output of the webserver. For example if the source code the user tried to deploy was invalid, the framework should report the exact exception raised while running the application. \pic{figures/webapp_and_editor.png}{The Code Editor and Web Application Example In TFW} Even if we did all this, we would still need a way to integrate this whole thing into a web based frontend with a file editor, terminal, chat window and stuff like that. Turns out that today all this can be done by writing a few hundred lines of Python code which uses the Tutorial Framework. \pic{figures/webapp_and_editor_err.png}{Invalid Code and Deployment Failure with Process Output} \subsection{Project Requirements}\label{requirements} Based on this it is now more or less possible to define requirements for the project. The reason for the ``more or less'' part is that all of this is pretty much bleeding edge, where the requirements could shift dynamically with time. For this reason I am going to be as general as possible, to the point that some of this might even sound vauge. To achieve our goals we would need: \begin{itemize} \item a way to keep track of user progress \item a way to to handle various events (i.e.\ we can react when the user has edited a file, or has executed a command in the terminal) \item a highly flexible messaging system, in which processes and frontend components (running in a web browser) could communicate with eachother \item a web based frontend with lots of built-in options (terminal, file editor, chat window, etc.) that use said messaging system \item stable APIs that can be exposed to content creators to work with (so that framework updates won't break client code) \item tooling for development (distributing, building and running) \end{itemize} \section{Early Development} Around a year ago a good friend and collage of mine Bálint Bokros, the CTO of our company Gábor Pék and myself would start designing the TFW architecture. In this early phase we would research solutions for the issues described such as tracking user progress, process management, interprocess communication and making a web based frontend application capable of communicatig with processes running inside a Docker container. After seeing some sort of light at the end of the tunnel regarding what technologies could be applied and coming up with several good alternatives Bálint Bokros was tasked to develop the first proof of concept and lay the foundations of the framework in his Bachelor's Thesis\cite{BokaThesis}. Although not much of the original code base has remained due to intense refactoring and all around changes, the result would serve as a solid foundation for further development, and the architecture is mostly the same to this day. The resulting code would be the first working POC% \footnote{Proof of Concept} of the framework showcasing the fixing of an SQL Injection attack. This initial version included the foundations of the framework: a working messaging system, event handling and state tracking. These provided a great basis despite of the fact that the core codebase of the framework was almost completely rewritten due to an increased focus on code quality, extensibility and API stability required by new features. It is interesting to note, that when I've mentioned that the project requirements were kept general on purpose (\ref{requirements}) I had good reason to do so. When taking a look at the requirements of Bálint's Thesis, much of that is completely obsolete by now. But since the project has followed Agile Methodology% \footnote{Manifesto for Agile Software Development: \href{https://agilemanifesto.org}{https://agilemanifesto.org}} from the start, we were able to adapt to these changes without losing the progess he made in said Thesis. Quoting from the Agile Manifesto: ``Responding to change over following a plan''. This is a really important takeaway. After becoming a full time employee at Avatao I was tasked with developing the project with Bálint, who was later reassigned to work on the GDPR compliance of the platform. Thus it became my job to turn the framework into a stable code base ready for usage by content creators and to implement most of the features that we've envisioned earlier.