How We Use Github To Build AppFog

Joining AppFog back in February became my first time working for a company using Github private repositories for source control. My previous gig used Bitbucket and Mercurial, which we migrated to from self-hosted SVN, which we migrated to from SourceGear Vault, which we migrated to from Visual Source Safe… etc. That’s how things go at a 12-year-old company, I guess. As a startup, AppFog can start off on the right foot and just use Github from the get-go. Yippee!

I have used git and Github for a while in open source projects and know the typical workflow of forking and submitting pull requests. AppFog’s workflow is very similar. Here’s how we approach it.

Setting Up Your Personal Fork

The AppFog organization has a number of projects in it. Everyone in the organization has commit access and can create/destroy/modify any of the repositories. We could just clone the repository down, edit the code, commit and push to master…but there’s a better way.

Our approach is for each developer to create a fork of the project they want to work on.

For example, suppose we have a repository in the AppFog org called yummy-sandwich. I’d head on over to the project page at https://github.com/appfog/yummy-sandwich and click the Fork button near the top of the page.

Once the “hardcore forking action” is over, I now have my own personal fork at https://github.com/thoward/yummy-sandwich. I can clone that to my dev machine:

git clone git@github.com:thoward/yummy-sandwich

One thing you’d notice there is that git clone adds the origin automatically. That’s great and all, but doesn’t do much for me. I need an easy way to pull upstream changes into my personal fork from the main repository. To do that, I’ll hop into that directory and add the upstream remote:

git remote add upstream git@github.com:appfog/yummy-sandwich
git fetch upstream

That’s the basic setup.

Modifying The Code

Now that I’ve got my personal fork configured, I can do whatever I’d like to it without disturbing our pristine shared repository… but to keep things organized and clear, I’m not just going to start tweaking bits and committing to master. Instead, I’ll work entirely on topic branches, regardless of how small or large a change is.

The following steps detail how to create a branch, update the code, push changes to the remote, and bring these changes to the attention of the person responsible for merging pull requests.

First, I’ll bring the master up to date using that upstream remote I created.

# switch to master
git checkout master

# pull in latest code
git pull upstream master

This will pull all recently-merged changes from the main appfog/yummy-sandwich repository into my personal fork. Next, I’ll create the branch that I’m going to work on. The naming of the branches takes a special structured form.

Branch Names

We break down the work we do into four distinct types of tasks: bugs, features, chores and hotfixes.

A bug is a normal bug fix of existing functionality.

A feature is new functionality.

A chore is something that adds business value, but doesn’t qualify as a feature (e.g. refactoring).

A hotfix is when we need to fix an immediate problem on a server (if you’re using hotfixes, you’re probably doing something wrong).

The branch name itself consists of the branch type, a brief underscore-separated description, and an optional Github Issue number. They are in the format of: type/a_brief_description_#.

Suppose I wanted to fix a bug where shared users aren’t able to login to the site. After creating the Github Issue, I would name the branch bug/shared_users_cant_login_123 where 123 is the issue number.

Creating the Topic Branch

To create the branch, run git checkout -b <branch name>, where <branch name> is the type/description/id name we just described. For example, using our hypothetical bug fix task from before, I’d run:

git checkout -b bug/shared_users_cant_login_123

Once I’ve created the branch using the checkout command, follow the normal process of changing code, add/commit, and wrap it up with:

git push origin HEAD

After that, use the GitHub web interface to select the branch from the Current Branch dropdown and issue a Pull Request back to the original project.

So, start to finish, here are the steps to get some work done…

Steps To Work On The Branch

  1. Checkout master: git checkout master
  2. Pull in updates: git fetch upstream
  3. Merge updates into master: git merge upstream/master
  4. Create the branch: git checkout -b bug/branch_description_123
  5. Make the code changes
  6. Stage changes for commit: git add <changed filenames>
  7. Commit changes: git commit -m "This is a descriptive commit message"
  8. Push changes to my personal fork: git push origin HEAD
  9. From GitHub, select the branch from the Current Branch dropdown and issue a Pull Request

Pull Request Guidelines

There are some basic guidelines around creating pull-requests.

Just Say No To Auto-Merge

Don’t auto-merge the pull request through Github’s interface. Ever.

Explain the Pull Request

Why is this pull request here? The branch name should be a brief description, but the pull-request itself should have a slightly more detailed description. Leave a nice, clear description about what the branch is for and why. Reference any issues if necessary. Github makes this super easy. You can just type something like #420 and it will auto-link the issue in.

Don’t Commit Gemfile.lock

This one is Ruby-specific. If you didn’t make changes to Gemfile, don’t commit Gemfile.lock.

Config Files

This one is Rails specific, and is a bit of hyperbole, but relates to our process of using git. If the branch requires the creation of a new config file, say so in the description in big, bold text.

Also do the following:

  1. Add the config file to .gitignore. We do not store our configs in source control.
  2. Provide an example of your config file suitable for development purposes at config/myconfig.example.yml where myconfig is your config and yml can be any format.

We have a rake task configs:copy that can find those example config files and copy them to the normal names. This is handy when developing locally. On the production servers, the config files are managed separately from source code and are linked into the config directory from another location.

Here’s the configs:copy rake task:

namespace :configs do
  desc "Copy config/*.example.yml files for development."
  task :copy do
    require 'fileutils'
    examples = Dir["config/*.example.*"]
    examples.each do |example|
      extname = File.extname(example)
      realbase = File.basename(example, ".example#{extname}")
      realpath = File.join("config", "#{realbase}#{extname}")
      unless File.exists?(realpath)
        FileUtils.cp(example, realpath)
        puts "copied #{example} => #{realpath}"
      end
    end
  end
end

Merging

Once the pull request is made, it is now the responsibility of the merge master to accept or reject the changes. Being the merge master is a rotating responsibility which involves looking over the pull requests, reviewing the code, and possibly asking the developer to make additional changes, add more unit test, etc.––this is our opportunity to discuss the changes in detail and introduce some standards and process for code quality.

When a pull-request is merged in, we’ll do that onto a special qa branch in the main repository. Here we can get all willy-nilly and try out the code, blow away the changes, or what have you. If everything looks good, then we merge from the change set in the qa branch that’s in a known-good state, finally, into the master branch on the main project. The qa branch is also the target that our continuous integration system watches, which, when updated kicks off our continuous delivery pipeline.

Conclusion

At AppFog we make it a practice to continually review and improve our workflows when we’re working with GitHub (or any tools for that matter). We are always looking for ways to reduce complexity while still getting the same or even better results.

One of the things we’re currently considering is moving away from personal forks and instead just working with topic branches right in the main project. This should keep things just as organized but reduce the complexity of the process significantly. One radical member of our team suggests never using branches. So far, we smile and nod. He may convince us someday.

We’re also rethinking the merge master role, opting instead to distribute the responsibility for merging code across the team but with the axiom “never merge your own code”, ensuring that code review is baked into the process.

We’ll keep you posted!

Additional Resources

Embarrassingly Cloudable

These days the world is abuzz with talk of cloud computing. Many of the more experienced developers out there chuckle at the term cloud computing. No one seems to know what cloud computing is, exactly, but everyone seems to be talking about it, offering services around it and has a great new idea for how to revolutionize something using it.

Anyone who made it through the tech world of the nineties will be all too familiar with diagrams like the following:

Note the internet cloud.

The cloud symbol, in PowerPoint diagrams everywhere, always meant “something complicated that just works”. It might be the internet. It might be the phone network. It might be some other opaque process that you just don’t care to explain in detail.

At its core “cloud” means “ambiguity”. That’s why it’s a cloud. Just like a foggy road at night: you can’t see through it and you can’t understand it. However, the more subtle meaning of the cloud icon is that it’s reliable. You don’t know how it works or why but you know that the arrow goes in one side and comes out the other and you can rely on that behaviour.

This is what we mean by cloud computing. It’s reliable and it’s a black box. It doesn’t matter how it works or what it does as long as it continues to work. That’s someone else’s problem. I’m sure they have detailed diagrams somewhere that explain how the cloud works (and those diagrams probably have OTHER clouds in them for services they depend on that just work that they don’t feel like explaining).

What else does it mean?

Cloud computing has come to mean something a bit more detailed. In fact, there are more detailed terms to talk about all the various things cloud computing is. It’s more of a collective term to describe a variety of services for modern computing needs. You may have heard of SaaS aka “Software as a Service”. Then there’s IaaS, PaaS, and lots of other *aaS things out there. The key part is “as a service”. The service part means someone else does it for you so you don’t have to.

There’s storage in the cloud, music in the cloud, friends in the cloud, applications in the cloud. There are services out there, on the internet, that do things for you so you don’t have to. They organize and store your music for you and give you access to it from anywhere that has an internet connection. They can even suggest new music for you that you haven’t heard about. They can do that with movies too. Great! No need for CDs, video tapes, VCRs, etc.. No more self-managed media access/devices. There are email services so you don’t have to run a local Exchange server. Yada yada yada.

Embarrassingly Parallel

A while back the computing world was obsessed with parallel programming. Actually, we still are but people aren’t talking about it quite as much as they were a couple of years ago. Why was it such a big deal then? We kind of hit a wall with Moore’s Law. Processors couldn’t get significantly faster, so code had to either get faster or more parallelized. Not everything can get faster or be parallelized but usually things that can’t get faster can get parallelized. However there are quite a few programming problems which could be parallelized but weren’t. Those things became known as “embarrassingly parallel” problems (note: this term has been in use for a LONG time, far before a few years ago). Embarrassing, IMO, because they should have been parallelized from the beginning because they are so well suited to parallelization. The fact that they weren’t parallel already meant we just didn’t care about efficiency because Moore’s Law made efficiency seem pointless to spend time on.

This should also be applied to cloud computing and we can use the term “embarrassingly cloudable“.

What are Embarrassingly Cloudable Problems

Every product that exists and thrives in the internet ecosystem needs a host of services to play well in that environment. At the basic level, it needs a machine with an OS to run on, network access, possibly a database to store information in, possibly a runtime environment like the JVM, CLR, or Ruby. Those things are the core services offered by the IaaS and PaaS offerings we have today. But we can take this a bit further… what about user and identity management? What about social networking? These are all things we want to have in every application on the internet but end up re-writing OVER and OVER again and never see the benefit of “economies of scale” that a centralized resource would have.

A list of some common embarrassingly cloudable problems:

  • Operating Systems
  • Networking and Network Security
  • Configuration Management
  • Continuous Integration and Automated Deployment
  • Databases and File Storage
  • Logging
  • User and identity management and authentication
  • Social graphs and networking
  • Performance Management/Monitoring (thanks @bobuva)
  • Routing, proxying and cacheing
  • etc…

Take a moment and think through your current product/project. How much of what you spend time on could be moved into a “cloud-based service” (aka, something you didn’t write, you don’t manage but fulfills its function reliably and in a generalized multi-tenant manner)? What pieces are cross-cutting through numerous applications? You might be surprised that there is probably already a service available to replace that for you.

Don’t repeat yourself and don’t waste time managing things that aren’t part of your company’s core competency or aren’t relevant to your product’s primary differentiators. Don’t roll your own. If you are, you should be embarrassed.

NodePDX: The Tech Conference Not to Miss

Some Reflection On Node.js

Node.js is a strange beast. It’s a technology that might just be worth the hype surrounding it. In this day an age, when everyone expects ‘viral’ to be part of their marketing campaign, when something actually does become popular rapidly and ends up with a bunch of people talking about it, it’s easy to discount the validity of the hype. Especially when it’s something as unexpected as serious developers being outspoken about how amazing Javascript is.

It’s almost like you’re living inside of a bad joke… But I gotta tell you all, this is no joke. Node.js really is all that and it doesn’t matter what the business folks say, what the marketing folks say or what the jaded developer to your left snidely derides. This is serious technology and it’s relevant.

Why Portland?

So you all have seen Portlandia, right? Well, it’s true. We here in Portland are a bit off our rockers and we’re passionate about a lot of things. We’re living proof that even though idealism may not apply globally, it can certainly live in a bubble and work here. As you might expect, we have a very idealistic software development community here. This is the land of fancy beer drinking beardos who crank out wicked bleeding-edge open source code on their sticker-encrusted macbooks and who look at you like you’re speaking a foreign language when you talk about LOB apps, COM+, or “enterprise solutions”. It’s only natural that Portland would become a center for innovation and adoption of emergent technologies. We may not have the vibrant start-up culture of SF or NYC ( or do we ? )
but I guarantee you we care about technology more than your average bear.

NodePDX == Node.js + Portland

Portland was a natural environment for hosting the two main conferences of 2011 for Javascript and Node.js. Apparently the weather scared them away. While NodeConfwill be returning this summer, NodeSummit took place in sunny business-friendly San Francisco and JSConf has chosen the dusty and dry environs of Scottsdale, Arizona this year (yep, PDX has 384% the rainfall, and we -deal with- love it!).

Well, we still admire Javascript here in Portland. Business interest or no business interest, we want to share, talk and hack together as a community indulging in our excitement for what has already been called the “Technology of The Year” for 2012 (who decides these things anyway?).

That’s what NodePDX is all about. This is a grassroots, independent, non-commercial conference. That means that everything is donated. Everything, including the event venue (thanks NedSpace), the screen, the microphone (thanks PIE), the projector, the t-shirts, the on-site massage therapist (thanks AppFog), the logo (thanks Julie), and even the hosted open-bar after party (thanks New Relic)… oh, and all the speakers, and the volunteers, who are doing this out of their passion for legit technology, without even thinking about money (not that we have anything against money. We love money. It allows us to buy beer, laptops and food… Without those things we couldn’t hack on JS all day and night, duh).

So, put on your thinking cap (and if you’re not from here, grab your umbrella) and come join us for a couple days of raw geekery. I guarantee you’ll learn something and have a good time doing it.