One month into a research internship

I’m just over a month into my latest internship. I'm part of a research group at my university, developing and testing a new type of load balancer. I had my fair share of doubts when I started. I was finally in a place where I was able to ramp up quickly in an industry internship, so part of me felt like I should stick with the industry and get better at working in it instead of seemingly flitting around.

And then I figured, I’m still in school. This is the time to flit.

So this is a quick write up of my experiences so far with research.

how I got started

At my last internship, I joined an internal group that met every week to discuss academic papers on databases and distributed systems. Though papers like Amazon’s on Dynamo, Yahoo's on YCSB, Google's on Spanner, and others (Chord, Gossip, Raft, Harvest & Yield, etc) I got an introduction to a pretty cool area in computer science: systems research.

Systems research felt like the combination of a lot of things I like about computer science: concepts like operating systems and networks, but also methodologies like using careful definitions and formal (well, formalish) reasoning to provide strong guarantees about the systems that we design and use every day. A lot of these papers began with defining a problem, providing some background information (useful in learning fundamental concepts), and then elaborating on how the system was designed and tested. Like I said, it was the intersection of implementation and careful evaluation. I loved it.

what I like about it so far

(roughly in order of most to least)

The project itself. I think this is the newest problem I’ve worked on so far. A little scary, but mostly enjoyable to cover new ground every day and work on a problem that hasn’t been addressed before.

Testing/evaluation. Some of my work these days involves designing and justifying testing configurations for the system that we’re developing, which has turned out to be a surprising challenge. I like how much importance is assigned to this phase of system design in research. You would be hard-pressed to find a paper that talks about design choices and implementation without a subsequent section or two on evaluation and benchmarking. I’m enjoying the process of defining relevant metrics and designing testing infrastructure to measure them.

Getting historical context. When we have to make a decision (how do we implement this design? what parameters should we use for this experiment?), the first thing we do is ask ourselves “what have others done?” The purpose of this is less to copy them and more to understand the design and use cases of other, similar projects and the reasons they made the decisions they did. I enjoy this very much because I get to gain more context into the present state of systems research. Also, the (relatively) greater precision and detail in papers makes me feel like the decisions I make now are less hand-wavy because I’m basing them off peer-reviewed, reproducible, well-documented work.

Different priorities. In the industry, a lot of projects are driven by increasing growth or profits. Sometimes projects involve technical debt or minimizing cost—which are all very reasonable things to care about. In research, you sort of define a problem and focus solely on solving it (in other words, you limit scope), which is sort of liberating.

Team size. Last summer, I worked at a company consisting of a few hundred engineers. Working there involved understanding how my team contributed to the engineering org as a whole and navigating a large codebase. Moreover, I’ve usually worked on projects that many other engineering teams had a stake in as well. I think this is fairly representative of most high impact internships.

But this time, I'm working on a smaller codebase. The total number of people involved in my project (contributors, stakeholders, etc) is 6, maybe 7. This means lesser communication overhead, a shorter ramp-up time, quicker feedback, and a faster pace of development.

research + industry

Obviously, academia and the industry have goals and interests that align to some extent and have a sort of interdependent relationship. I think that some of what I like about research can also be found in the industry if you join the right company and team—which is why I’m not fully sold on sticking with it for the long run. Yet. Sometimes, it does feel like my work isn’t as impactful because it has a long way to go before it’s published, and an even longer way to go before it’s used in the “real world” (data centres, in my case). But I think research is a first, often crucial step towards driving real change. It’s just that it’s such an early step, and seems so separate from the industry that working in research feels like playing a long game.

what I’m working on

The five word answer is “an RDMA based load balancer”! I’m planning to more fully talk about it in a future blog post (because I have to write a paper on it for school anyway), but if you’re interested in what Remote Direct Memory Access is, you can read about it here (and the references at the end).