CraftConf 15–Takeaways from “The Hidden Dimension of Refactoring”

One of the great things about Craft­Conf is the pletho­ra of big name tech speak­ers that the orga­niz­ers have man­aged to attract.

Michael Feath­ers is def­i­nite­ly one of the names that made me excit­ed to go to Craft­Conf.

 

Know your commits

We have a bias towards the cur­rent state of our code, but we have a lot more infor­ma­tion than that. Along the his­to­ry of its com­mits a piece of code can vary great­ly in size and com­plex­i­ty.

To help find the places where meth­ods jump the shark, Michael cre­at­ed a tool called Method Shark, you can get it from github here. What he found is that it’s cul­tur­al, some places just don’t refac­tor much whilst oth­ers wait until it hits some thresh­old.

 

We’re at an inter­est­ing stage in our indus­try where we’re doing lots Big Data analy­sis on cus­tomer behav­iour and usage trend, but we’re not doing any­where near enough for our­selves by analysing our com­mit his­to­ry.

Those of you who read this blog reg­u­lar­ly would know that I’m a big fan of Adam Tornhill’s work in this area. It was great to see Michael give a shout out to Adam’s book at the end of the talk, which is avail­able on Ama­zon:

image

 

Richard Gabriel not­ed that biol­o­gists have a dif­fer­ent notion of mod­u­lar­i­ty to us:

Biol­o­gists think of a mod­ule as rep­re­sent­ing a com­mon ori­gin more so than what we think of as mod­u­lar­i­ty, which they call spa­tial com­part­men­tal­iza­tion”

- Richard Gabriel (Design Beyond Human Abil­i­ties)

So when biol­o­gists talk about mod­u­lar­i­ty, they’re con­cerned about evo­lu­tion and ances­try and not just the cur­rent state of things.

By record­ing and analysing inter­est­ing events about changes in our com­mits – event type (add/update), method body, name, etc. – we too, are able to gain deep­er into the evo­lu­tion of our code­base.

 

Commit history analysis

Michael demon­strat­ed a num­ber of analy­sis which you might take inspi­ra­tions from and apply on your code­base.

 

Method Life­line

This maps the size of a par­tic­u­lar method in lines over its life­time.

image

From this pic­ture you can see that this par­tic­u­lar method has been grow­ing steadi­ly in size so it’s a good can­di­date for refac­tor­ing.

Method Ascend­ing

You can find meth­ods that have increased in size in con­sec­u­tive com­mits to iden­ti­fy meth­ods that are good can­di­dates for refac­tor­ing.


It might be that the way to refac­tor the method is not obvi­ous, so the deci­sion to refac­tor is delayed and that inde­ci­sion com­pounds. The result is that the method keeps grow­ing in size over time.

You can also apply method ascend­ing to a whole code­base. For instance, for the Rails project as of 2014 you might find:

[1827, 765, 88, 13, 2, 1]

i.e. 765 meth­ods have been grow­ing in size in the last 2 con­sec­u­tive com­mits, 88 meth­ods have grow­ing in the in size in the last 3 con­sec­u­tive com­mits, and so on.

Trend­ing Meth­ods

You can use this analy­sis to find the most changed meth­ods in say, the last month.

Activ­i­ty List

Things that are changed a lot are prob­a­bly worth inves­ti­gat­ing too. You can iden­ti­fy bouts of activ­i­ties on a method over a giv­en peri­od. For exam­ple, a method that has exist­ed for 18 months were changed dur­ing bouts of activ­i­ties over a peri­od of 9 months.

This helps to iden­ti­fy meth­ods that you keep going back to, which is a sign that they may be vio­la­tions to sin­gle respon­si­bil­i­ty and/or open-closed prin­ci­ples.

Sin­gle Respon­si­bil­i­ty Prin­ci­ple

If you find meth­ods that are changed togeth­er many times, then they might rep­re­sent respon­si­bil­i­ties with­in the class that we haven’t iden­ti­fied, and there­fore are good can­di­dates to be extract­ed into a sep­a­rate abstrac­tion.

Open/Closed Prin­ci­ple

Even when you’re work­ing with a large code­base, chances are lots of areas of code don’t change that much.

You can find a “clo­sure date” for each class – the last time each class is mod­i­fied. Don’t con­cen­trate your refac­tor­ing efforts on areas of code that have been closed for a long time.

Tem­po­ral Cor­re­la­tion of Class Changes

Find class­es that are changed togeth­er to find areas in the sys­tem that have tem­po­ral cou­pling through coin­ci­den­tal changes hap­pen­ing across dif­fer­ent areas.

These might be results of poor abstrac­tions, where respon­si­bil­i­ties haven’t been well encap­su­lat­ed into one class. They can be indi­ca­tors of areas that might need to be refac­tored or re-abstract­ed.

Active Set of Class­es

If you plot the no. of class­es at a point in time (blue) vs the no. of class­es that hasn’t changed again since (green):

image

the dis­tance between the two lines is the num­ber of things you’ll be touch­ing since that point in time.

In the exam­ple above, you can see a large num­ber of class­es was intro­duced in the mid­dle of the graph,  and grad­u­al­ly these class­es became closed over time.

Look­ing at anoth­er exam­ple:

image

we can see that a lot of class­es were intro­duced in two bursts of activ­i­ties and it took a long time for them to become closed.

Tur­bu­lence

This is a mea­sure of the no. of com­mits on a file vs its com­plex­i­ty.

image

From this graph, we can see that most of the files in this code­base has rel­a­tive­ly few num­ber of com­mits and are of low com­plex.

Be ware of things in the top-left and top-right quad­rants of the graph, they are things that either:

  • gained a lot of com­plex­i­ty with­out a lot of com­mits (did one of your hot­shots decide to go to work with that file?), or;
  • things that are com­plex and changes a lot (are these code that are so com­plex nobody wan­na do any­thing about it? so the lev­el of com­plex­i­ty just grows with each com­mit)

Proac­tive refac­tor­ing can be a way to stop the com­plex­i­ty of these class­es from get­ting out of con­trol.

Fre­quen­cy of Inter-Com­mit Inter­vals

You can also dis­cov­er the work­ing style of your devel­op­ers by look­ing at the fre­quen­cy of inter-com­mit inter­vals.

image

image

Churn

Churn is the num­ber of times things change. If you look at the repos­i­to­ry for ruby-gems, where:

  • every point on the x-axis is a file;
  • and y-axis is the num­ber of times it has changed in the his­to­ry of the code­base

image

When we think about the sin­gle respon­si­bil­i­ty prin­ci­ple and the open-closed prin­ci­ple, we think about class­es that are well abstract­ed and don’t have to change fre­quent­ly.

But every code­base Michael looked at has a sim­i­lar graph to the above where there are areas of code that changes very fre­quent­ly and areas that hard­ly changes at all.

Michael then ref­er­enced Susanne Siverland’s work on “Where do you save most mon­ey on refac­tor?”. I couldn’t find the paper after­wards, but Michael sum­marised her find­ing as – “in gen­er­al, churn is a great indi­ca­tor, when you com­pare it with oth­er things such as com­plex­i­ty, in terms of the pay­off you get from refac­tor­ing, and in terms of reduc­ing defects.”

In hind­sight this is pret­ty obvi­ous – bugs are there because we put them there, and if we keep going back to the same areas of code then it prob­a­bly means they’re non-opti­mal and each time we change some­thing there’s a chance of us intro­duc­ing a new bug.

If you look at Adam Tornhill’s Code as Crime Scene talk, this is also the basis for his work on geo­graph­i­cal pro­fil­ing of code.

image

 

Final­ly, keep in mind that these analy­sis are intend­ed to give you indi­ca­tors of where  you should be doing tar­get­ed refac­tor­ing, but at the end of day they’re still just indi­ca­tors.

 

Links