diff --git a/bibliography.bib b/bibliography.bib index de5daee..feb67ad 100644 --- a/bibliography.bib +++ b/bibliography.bib @@ -119,3 +119,12 @@ year={2016}, month=mar, } + +@online{NodeModules, + title={What’s really wrong with node\_modules and why this is your fault}, + url={https://hackernoon.com/whats-really-wrong-with-node-modules-and-why-this-is-your-fault-8ac9fa893823}, + language={english}, + author={Mateusz Morszczyzna}, + year={2017}, + month=dec, +} diff --git a/content/using_the_framework.tex b/content/using_the_framework.tex index 34cb160..032af63 100644 --- a/content/using_the_framework.tex +++ b/content/using_the_framework.tex @@ -273,29 +273,30 @@ initialized in, which exposes several communicative options: captionpos=b ]{listings/event_handler_main.py} -\section{Setting Up a Developer Environment} +\section{Setting Up a Developer Environment}\label{devenv} To make getting started as smooth as possible I have created a ``bootstrap'' script which is capable of creating a development envrionment from scratch. -This script is distributed as a bash one-liner command, like so: +This script is distributed as the following bash one-liner: \begin{lstlisting}[language=bash] bash -c "$(curl -fsSL https://git.io/vxBfj)" \end{lstlisting} -This command downloads a script using \code{curl}, then executes the downloaded -script in bash. +This command downloads a script using \code{curl}% +\footnote{\href{https://curl.haxx.se}{https://curl.haxx.se}}, then executes it in bash. In the open source community it is quite common to distribute installers this way% \footnote{A good example of this is oh-my-zsh \href{https://github.com/robbyrussell/oh-my-zsh}{https://github.com/robbyrussell/oh-my-zsh}}, which might seem a little scary at first, but is not less safe then -downloading and executing a binary installer from a website with a valid TLS certificate. +downloading and executing a binary installer from a website with a valid TLS certificate, as +\code{curl} will fail with an error message if the certificate is invalid. This is because both methods place their trust in the PKI~\footnote{Public Key Infrastructure} to defend against man-in-the-middle% \footnote{\href{https://www.owasp.org/index.php/Man-in-the-middle_attack} {https://www.owasp.org/index.php/Man-in-the-middle\_attack}} attacks. Debating the security of this infrastructure is certainly something that we -as an industry should constantly do, but it is out of scope for this paper. +as an industry should constantly do, but it is out of the scope of this paper. Nevertheless I have also created a version of this command that checks the SHA256 checksum of the bootstrap script before executing it @@ -312,10 +313,12 @@ then pipes it into a bash interpreter \emph{only if} the checksum of the downloaded string matches the one provided, otherwise it displays an error message. Software projects distributing their product as binary installers often -display such checksums on their download pages. +display such checksums on their download pages with the purpose to potentially +mitigating MITM attacks. The bootstrap script clones the three TFW repositories and does several steps -to create a working environment: +to create a working environment into a single directory, that is based on +test-tutorail-framework: \begin{itemize} \item It builds the newest version of the TFW baseimage locally \item It pins the version tag in \code{solvable/Dockerfile}, @@ -325,8 +328,104 @@ to create a working environment: \end{itemize} It is important to note that this script \emph{does not} install anything system-wide, it only works in the directory it is being executed from. +This is a good practice, as many users --- including me --- find scripts that +write files all around the system intrusive if they could provide the same functionality +while working in a single directory. -It would be a lot easier to simply use Docker Hub% +It is also worth to mention that it would have been a lot easier to simply use Docker Hub% \footnote{\href{https://hub.docker.com}{https://hub.docker.com}}, but since the code base is currently proprietary we cannot distribute -it using a public medium. +it using a public medium, and we use our own image registry to store private Docker +images. + +\section{Building and Running a Tutorial} + +After the environment has been created using the script described in~\ref{devenv}, +it is possible to simply use standard Docker commands to build and run the tutorial. +Yet the \code{hack} directory of test-TFW also provides a script called +\code{tfw.sh} that developers prefer to use for building and running their +exercises. +Why is this the case? + +\subsection{The Frontend Issue} + +To be able to understand this, we will have to gain some understanding of the +build process of Angular projects. + +When frontend developers work on Angular projects, they usually use the built-in +developer tool of the Angular-CLI% +\footnote{\href{https://cli.angular.io}{https://cli.angular.io}}, +\code{ng serve} to build and serve their application. +The advantage of this tool is that it automatically reloads the frontend +when the code on disk is changed, and that it is generally very easy to work with. +On the other hand, a disadvantage is that a \code{node_modules} directory +containing all the npm% +\footnote{\href{https://www.npmjs.com}{https://www.npmjs.com}} +dependencies of the project must be present while doing so. +The problem with this is that because the JavaScript ecosystem is a \emph{huge} +mess\cite{NodeModules}, these dependencies can easily get up to +\emph{several hundreds of megabytes} in size. + +To solve this issue, when creating production builds, +Angular uses various optimizations such as tree shaking% +\footnote{\href{https://webpack.js.org/guides/tree-shaking/} +{https://webpack.js.org/guides/tree-shaking/}} +to remove all the dependencies that won't be used when running the application% +\footnote{Otherwise it won't be possible to serve these applications efficiently +over the internet}. +The problem is, that these things can take a \emph{really} long time. +This is why today frontend builds usually take a lot longer than building anything +not involving JavaScript (such as C++, C\# or any other compiled programming language). + +This mess presents it's own challenges for the Tutorial Framework as well. +Since hundreds of megabytes of dependencies have no place inside Docker containers% +\footnote{Otherwise it may take tens of seconds just to send the build context to +the Docker daemon, which means waiting even before the build began}, +by default the framework will only place the results of a frontend production build +of \code{solvable/frontend} into the image layers. +This slows down the build time of TFW based challenges so much, that instead of like +30 seconds, they will often take 5 to 10 minutes. + +\subsection{The Solution Offered by the Framework} + +To circumvent this, it is possible to entirely exclude the Angular frontend from a TFW +build, using build time arguments% +\footnote{In practice this is done by supplying the option +\code{--build-arg NOFRONTEND=1} to Docker}. +But when doing so, developers would have to run the frondent locally with +the whole \code{node_modules} directory present. +The bootstrap script takes care of putting these dependencies there, +while the \code{tfw.sh} script is capable of starting a development server +to serve the frontend locally using \code{ng serve} besides starting +the Docker container without the frontend. +If this whole thing wasn't complicated enough, since Docker binds the port +the container is going to use, \code{tfw.sh} has to run this dev server on +an other port, then use the proxying features of Angular-CLI to forward requests +from this port to the runnign Docker container when requesting resources +other then the entrypoint to the Angular application. + +This is the reason why the frontend is accessible through port \code{4200} (default +port for \code{ng serve}) when using \code{tfw.sh} to start a tutorial, but when running +a self-contained container built with the frontend included it is accessible on port \code{8888} +(the default port TFW uses). + +While it also provides lots of other functionality, this is one of the reasons why +the \code{tfw.sh} script is a several hundreds of lines long bash script. +The implementation of making the frontend toggleable during Docker builds requires some +of the \code{ONBUILD} stuff we've discussed earlier: +\begin{lstlisting}[language=bash] +ONBUILD RUN test -z "${NOFRONTEND}" &&\ + cd /data && yarn install --frozen-lockfile || : + +ONBUILD RUN test -z "${NOFRONTEND}" &&\ + cd /data && yarn build --no-progress || : + +ONBUILD RUN test -z "${NOFRONTEND}" &&\ + mv /data/dist ${TFW_FRONTEND_DIR} && rm -rf /data || : +\end{lstlisting} +Remember that \code{ONBUILD} commands run in the build context of the child image. +What these commands do is they check if the \code{NOFRONTEND} build argument +is present or not, and only deal with the frontend if this argument is not defined. +The \code{|| :} notation in bash basically means ``or true'', which is required +to avoid aborting the build due to the non-zero return code produced +by the \code{test} command if the build arg is defined.