image

Lessons I’ve learned on optimizing the performance of web applications while building a multiplayer game with websockets

1) Browser Profiling

The browser has a wealth of tools that allow us to peek into our application. I’ll be focusing on the Network and Performance tabs of Google Chrome, but I suspect this is just the tip of the iceburg.

image

A) The Network Tab

I was familiar with this tab before this project but hadn’t drilled down to the sizes, compression, and load time of each individual request. In fact, I hadn’t totally comprehended how this list of requests encapsulates a truer representation of my project than the IDE does, because here I can see the order of execution of not only my code, but all of the dependencies my project relies on to function. For example, I can see how Vite is bundling my JavaScript, HML and CSS files, and then chunking it. I can also see every google font I am requesting, and every npm package.

For example, we can see the entirety of the Socket.io library being installed from NPM here:

image

We can see the browser requesting every part of the THREE.JS library it needs here: image

Next on our tour is the timing window, where we can see how long each step of a script takes:

image

This is ThreeJS again ^

Chrome also provides a helpful visualization of all messages sent over websockets, which can be zeroed in on by clicking the websocket filter image

And allow us to see a play-by-play of the messages sent on a particular WS channel:

image

B) The Performance Tab

As powerful as the network tab is, the performance tab is like a spaceship. When I first opened this, I was so overwhelmed it didn’t even seem like it contained any useful information. As I’ve used it more, I’ve learned that it’s a swiss-army knife that can be used in a variety of ways.

image

When zoomed in, it shows every process run on a website.image Those processes can be cross-referenced with still taken from renders of the website displayed in parallel, above them:

image

And each process can be clicked on in order to understand the exact sub-processes that it triggers and is dependent on and has relationships with, how long each one took, how much memory, how much CPU.

image

It’s great for:

  • Zooming in on a particular event in an application
  • Investigating what order functions are called in the event loop
  • Discovering memory leaks using the graph of memory and other performance metrics
  • Seeing when garbage collection takes place (or doesn’t)

2) Serverside Profiling

We’ve looked at a lot of profiling in the client, but it is also possible to profile the server. In order to start this process, run your server file with the --inspect flag::

node --inspect index.js

You should see some output that looks a little something like:

Debugger listening on ws://127.0.0.1:9229/7825c3f2-d5c0-4c44-b6fe-dbb616650ae7
For help, see: https://nodejs.org/en/docs/inspector

I’m not really sure what this output is about to be honest. When I navigate to the supplied URL, I don’t see anything.

Next, we navigate to chrome://inspect, and press “inspect” on our remote target,

image

providing us with a profiler window much like what we have seen on the client:

image We can record activity in the Performance tab just like we’ve been doing in the client as well, and see the slew of events being recorded.

image

Which is amazing because I can zoom in to see exactly how long each of my methods take to run in the call tree:

image

Zoomed in view of the event stream

image

The call tree that makes up my getState function

3) Automated Testing

You might benefit from automated browser testing if:

  • your roommate is out and can’t test your site
  • you are developing carpal tunnel syndrome from repeatedly typing localhost:5163

My tool of choice is called Playwright, which I was introduced to by the excellent James Watterswhen I worked at Tcare. Playwright is a well-documented, asynchronous testing framework for the browser performance of Python, Java, Net and Javascript webapps.

It supports all major browsers and will translate browser actions into code to help you write tests quickly. It also has a profiler that allows a test to be “rewinded”, saved, and loaded.

It also allows for tests to be run in parallel, which I have taken full advantage of to stress test my application.

What slowly became apparent was that manually testing my application was not empirical enough. By nature of being human, testing my application introduced too many unknowable or unquantifiable independent variables to compare different architectures or configurations with the certainty that I wasn’t in fact comparing some inconsistency I had introduced.

Just pressing a few buttons as a single user in this game wasn’t adequate or consistent as a test. Only under heavy load and the strange edge cases that can be created by twelve, or even twenty autonomous agents, can I quickly and reliably discover weaknesses in my code.

For example, I attempted to batch the update events emitted by my server to every client, and quickly found that collapsing the updates of a single entity into the most recent event caused a lot of side effects.

updates = [
	'object-09ua9u': [...],
	'object-08uaus': [...]
	'object-a89273': [
		"create": {x: 2, y: 3},
		"move": {x: 2, y: 4},
		"delete": null
	]
]

Automated testing made this apparent almost immediately in a way manual testing probably wouldn’t have. In the end, testing with playwright has saved me a lot of time and hassle and provided me with the first real clear performance metrics I had in this process. I wish I had started doing this on the first day.

One challenge I have faced is outputting metrics aggregated from each test instance. My scrappy solution was building a test server than can receive memory usage gathered by each test and print out the average.

// Start memory usage collection
 
const intervalId = setInterval(async () => {
	const memory = await page.evaluate(() => {
		const performance = window.performance as Performance & { memory?: any };
			if (performance && performance.memory) {
				return {
					jsHeapSizeLimit: performance.memory.jsHeapSizeLimit,
					totalJSHeapSize: performance.memory.totalJSHeapSize,
					usedJSHeapSize: performance.memory.usedJSHeapSize,
				};
 
			}
 
		return null;
 
	});
 
	if (memory) {
		memoryUsageData.push(memory.usedJSHeapSize / 1024 / 1024); // Convert bytes to MB
	}
 
}, 1000); // Collect data every second
 
...
 
await fetch('http://localhost:3088/report/', {
	method: 'POST',
	headers: {
	'Content-Type': 'application/json'
	}, body: JSON.stringify({ avgMemoryUsage, maxMemoryUsage, minMemoryUsage})
})

In the future I would like to check out a load-testing library called artillery and to investigate better ways to log test results.

4) User Testing

I happened to be in a room full of people so I walked around and just told them to try out my live demo.

User testing is the most indispensable performance testing tactic of them all, but it’s an even better design tactic, as it allows you to see what human beings actually respond to.

Playwright is great, and so is automated testing, but it has some weaknesses. If I tell playwright to open up 24 chromium instances on my brave little Macbook Air, I’m not really testing concurrent connections anymore. I’m just overstepping the reasonable memory limitations of my computer, and no longer getting very meaningful results.

This is the perfect example of how tests in at least an exact copy of the production are the only way to truly know the performance of an application. This includes user behavior - users do unexpected things without exception, and I don’t believe it’s possible to forecast the effect user behavior will have on performance, especially in something like a multiplayer game. It’s a far more effective use of time to observe and take notes.

Like Tommy Caldwell on pitch 15, you can build an exact replica of specific, significant performance challenges you see out in the world. You can try to solve them in every possible way in a controlled environment, and see if your results check out.

image

Photo by Becca Caldwell

5) Server Logs

It’s a little embarrassing, but until last week I really can’t say I understood how logging as a concept was distinct from console.logging in JS or print in python.

Somewhere in between that first Hello World and the situation in which I was setting up cloud watch alerts on the log for an EC2 instance at my job, I had an opportunity to ask “What is the fundamental makeup of these pieces of text, and how are they different than one another?” And I never really did that.

During this project, I’ve checked the logs of the running processes of the community server I’m using. Chaotically, these logs are shared by about ten different projects. I have to admit, I like this - it makes visualizing the community server’s activity dead simple. It’s one server, it’s serving a dozen projects, and this is what it brain looks like.

But if you see this long enough, and you start to think, “there must be a better way”. Something along the lines of, I dunno, what if we just concatenated these lines of text into a file, but maybe separate files. That way they’d:

  • Persist
  • Be separated into meaningful streams of information

Since we could ostensibly format the text ourselves, we could really reform the text to have any timestamp conventions we want, any indentation, etc. Whatever was easiest for us to read, really.

And you start to think, well, if I don’t trust the persistence of the server, I could save it to a database. Or even to another server. Or I could even email things to myself - it’s logging, not a web protocol. The information should be structured and transmitted in the ways that are most valuable for debugging.

So you write something like this to get started: image

And think, cool, now the next time my server crashes, I’ll know why.

6) Unit Testing

Testing individual functions for performance seems like a good idea - one I have not yet done. I would imagine the browser profiling tools may be able to help with some of this, automated testing tools could to, and there may also be tools that allow this to be done inside of an IDE. What I imagine is something that catalogues every method and records how long it takes to run on average over several instances.

7) Architecture Audits

Originally, I sent the entire board state of my game both:

  • From the server to the client
  • From the client to server

It was a lot of network overhead. I changed this architecture to only send updates of individual entities. In the profiling section, I also mention an attempt at batching some of these updates (with less success so far). Both of these changes represent ways that the interplay between client and server could be made more efficient.

It’s important to validate these things before enacting them. I’ll admit, I did this refactor on a hunch. My refactor saved about 14MB of unnecessary data being sent with each request. This is nice, but it took me about a week.

I paired this refactor with re-using rendered geometries and meshes on the frontend rather than tearing down every object in the scene and re-instantiating it.

Even together, the combined performance gains of these refactors are humble.

However, they are definitely a better design, and had I been more experienced at this interplay of a 3D-ish world being synchronized over websockets, I might have designed it from the beginning.

Some other architectural / data model changes I’d like to explore:

  • Locality: limiting updates to players within a certain range of those updates
  • A sparser representation of initial state
  • A sparser representation of all state, possibly using raw binary instead of unicode

What I’ve learned from these architectural changes is that architectural refactors are costly in terms of time, and it’s prudent to profile prototypes of these changes. Doing this research up front can make sure that time is being used to solve tangible performance issues.

8) Pairing

Sometimes the best way to break down a complex pipeline, application, or interconnected set of any sort of processes is to walk someone else through it. In the process, it seems common to realize new perspectives on what was built. The person we walk through this process almost always can offer some additional novel perspectives as well.

I was lucky enough to pair with Robin Neufeld on this project. I was straightforward about my optimization goal was:

allow about 10 people to join and play the game without experiencing noticeable lag

And that although I could think of many potential performance improvements, I didn’t know which one would be the most productive in achieving my goal. However, after walking her through all the tools I had at my disposal for assessing different techniques, I feel a lot closer to understanding how my goal could be achieved.

A list of the most promising optimizations I could implement

image

image

A scrappy list of why I might experience performance issues in certain situations

Robin took a very stress free dive into profiling tools and just sort of opened up my ability to interface with these tools. I was very driven and specific in how I wanted to use them, but Robin was not, and consequently seemed to play around with them a lot more, discovering a lot of the features about these browser profiling tools I have documented above.

Most importantly, Robin kept me on track, not allowing me to dive into rabbitholes that weren’t yet validated as worthwhile.

9) Comparisons of Environment

Some examples

  • Different Deployments
  • Different Machines
  • Different Browsers
  • Different Node Versions

The bottom line is, consider the assumptions you are making about the universality of the environment you are running your code in and explore how it behaves in as many other environments as possible.

The local / remote comparison is one of the most important. I’ve become a very big believer in deploying early, since a deployed environment and the considerations that come with it are often dramatically different than a local environment. Learning lessons in this early in a project’s lifecycle can sometimes change the architecture of a project in ways that will be difficult to implement later on.

I am a full-stack web developer with no formal CS training. I started with frontend javascript web frameworks for the most part, moved on to server-side development mostly in the very abstract languages of Javascript and Python, and then eventually began to engage in Devops almost exclusively with highly abstracted cloud service products. Only recently have I begun learning about the low-level components that power these things, like distributed computing, memory-safe languages, virtual memory, web protocols, and a slew of technologies closer to the actual hardware powering the abstractions I’ve grown so familiar with.

From this perspective, it’s a bit easier to ignore the differences of different server setups. But I’m coming to realize that understanding the base hardware, or at least the architecture of the distributed or server-less service being used, is crucial to understanding what my performance expectations should be. image image

All deployment examples you’ve seen so far are run off of a raspberry pi in the next room that is serving about fourteen other fun projects:

image

It’s a great solution because these projects are mostly pretty small in scale, and don’t usually deal with simultaneous traffic, as they are generally demo’d and tested at different times.

However, it’s obviously not the most robust server solution! So even after all this optimization work, my expectations can only be so high.

In the future, I really look forward to using these tools to understanding the performance tradeoffs of many other types of deployment solutions, memories, data structures, and architectures. I set out to make a fun game, never knowing that the performance hurtles I discovered would teach me far more about programming than the game logic or network protocols ever would.