Modernizing Legacy Code at The Graide Network

When I joined The Graide Network earlier this summer I inherited a codebase that had been worked on by a couple different contractors over the past year. Like many offshored projects, it was a mess: there was logic in the views, four different methods to update a user, it was running on an unsupported version of PHP, and of course there weren’t any tests.

Still, this was a challenge that I was ready to take on. The code was working (for the most part), and Blair and Liz had a clear vision of what they wanted their system to offer users, so we got to work fixing things.

While this process is still in progress, we’ve made considerable headway in the past couple months. In the rest of this post, I’ll outline what we’ve done to modernize our legacy codebase at The Graide Network, and hopefully this will help other developers who have inherited outdated or untested code.

1. Learn about the domain

Before we started digging into the code, I had to learn about our business domain. I’m a big fan of Eric Evans’ idea of Domain Driven Design, which dictates that a prerequisite for designing or improving our system would be sitting down with the founders to learn about the language and functions of the business. Even before I started working full time, I spent a lot of time discussing the application, the customers, and users with the team. This was invaluable, and gave me enough context to start figuring out how the code worked and how it might need to work in the future.

2. Assess the current state of the application

I use a four-stage software maturity level that was inspired by the Capability Maturity Model for software organizations. I wrote a more detailed post about this model here, but when I started assessing The Graide Network’s application, we were firmly at level 1: functionality.

In other words, we couldn’t deploy updates quickly (SSH-ing into the server and pulling from the repository was necessary), new releases tended to be brittle (introducing a new feature almost always broke something else), and building new functionality on top of the current codebase was extremely difficult. There were whole data models stored in serialized PHP arrays in the database, making the simplest queries literally impossible.

3. Establish the ability to make changes

“If it hurts, do it more often.” - FrequencyReducesDifficulty by Martin Fowler

Before we got to work writing any code, we had to be able to quickly deploy changes to the codebase. This is an essential ability at a startup where requirements change often and everyone wants to be agile, but you have to have a system in place in order to make automated deployments work. Here are some of the things that we had to do to automate deployments and bring us to technical maturity level 2:

  • Started using version control correctly with a good branching strategy.

  • Removed user uploaded files from version control.

  • Built a Codeship pipeline with a shell script to SSH into our server and deploy whenever changes were made to the master branch.

  • Set up a similar pipeline to pull code from our development server when changes were made to that branch.

  • Moved key environmental variables to a .env file.

4. Start writing tests

Next, we wanted to ensure that we didn’t break everything that was already working. We picked a few key scenarios then wrote some acceptance tests to verify that they worked. I have to give a big shout out to my fellow engineer at The Graide Network, Zach Garwood, for introducing me to Mink. Mink is a BDD testing tool that allows us to make sure the whole flow from database to HTML works correctly. While I’d prefer to have unit-tested code as well, a handful of BDD tests is better than nothing.

5. Tackle the worst offenses with the highest business need

Probably the hardest thing to do in refactoring legacy code is deciding where to start. The business team had new features that they wanted to see by fall, there were high priority security items that we needed to address, and some of the data models were so bad that there’s no way we could change them.

I decided to look at the data models with the worst problems and the highest business need for improvement. For us that meant tackling the course and assignments portions of our application first. The original code didn’t properly link assignments to the course and/or section that the teacher had created. In fact, they had chosen to store courses in a serialized array in the users table, meaning that we couldn’t query assignments by courses or even change their associations easily. We also couldn’t associate assignments with multiple sections in a course. On top of that, several new features that our customers had asked for relied on better data storage for courses and assignments, so addressing this part of the code had to be done first.

6. Building clean components

Rebuilding within an existing system is hard. Most developers prefer to just throw out the old code and build the whole thing again from scratch, but my experience (and a lot of great mentors) have guided me away from this approach.

Instead, we took the two data models that were most poorly built in the legacy system and created two microservices - very simple REST APIs - to handle CRUD operations on these models. Then, we created a client that allowed our legacy application to read and write data to these new services. Now, instead of writing and reading from the poorly designed database tables, the application was using our new APIs.

What I like about this approach is that it allows us to have a feeling of greenfielding our application without actually taking on the risk of starting at square one or maintaining two applications. The simple microservices will eventually take on an increasing amount of business logic from the legacy application, and the legacy application will eventually be completely replaced by a series of microservices.

I also like this approach because it allowed myself and our other engineer - both new to this project - to gradually learn how the application actually worked. Rather than trying to dig in and rewrite a lot of code within a brittle system we were able to create isolated components that we can drop into specific places in the existing codebase.

7. Mapping out the future

Now that our most mission-critical components have been cleaned up, we’ve been able to establish a priority list of new features and make plans for moving completely off our legacy system and into new, more modular and tested components. I’d put us somewhere between level 3 and 4 on the technical maturity scale at this point. We aren’t quite ready to run, but at least we’re crawling at a good pace.

The key is not to rush this kind of job. It’s tempting to think we can just rebuild the whole thing faster than we could refactor it, but I’ve never found that to be the case. It’s also tempting for the business to push developers to just push new features on top of the legacy code, but that will introduce even more risk in the long run.

If you’re interested in learning more about updating legacy codebases, I’d recommend Paul Jones’ book on Modernizing Legacy Applications in PHP. If you have questions or want to dig into details on our process at the Graide Network, you can find me on Twitter.

This blog post was written by Karl Hughes, the Chief Technology Officer at The Graide Network.