dc.description.abstract | Comments are ubiquitous in source code of real software systems. Software developers
rely on them for comprehending and evolving software systems, i.e., to add new features and fix
bugs. They imbibe the practice of commenting code from their formative years. Therein lies the
critical questions for which scientific answers are largely and strikingly absent: “how prevalent
are good and bad comments in production code, e.g., open source software?” or “do developers
agree on their usefulness, e.g., with varied levels of experience?” Are these issues a matter of pure
vanity or do they bear substance, e.g., do comments associate, if at all, with software complexity
and quality, and to what extent, e.g., cohesion and coupling metrics? Answers to these questions
directly influence developer communication and productivity, and software cost, reliability, and
quality.
This work conducted a series of rigorous empirical studies in quest of initial answers to
the above stated questions. Both quantitative and qualitative investigations were performed. Our
results from six open source projects show that although 60% of source code comments are good,
there are 40% bad comments (with a supermajority agreement). Not all developers always agree
on the goodness or badness of specific comments. We also investigated the correlation between
object-oriented complexity, cohesion, and coupling metrics with the source code comments. We
did find evidence for increased levels of comments with low quality and/or high-complexity code.
Future work entails the automatic classification of source code comments into a taxonomy
of good and bad comments, and to formulate approaches to prevent and eliminate bad comments
(e.g., refactor code reeking with bad comment). | |