Software repositories, SourceForge, GitHub, etc. contain an enormous corpus of software and information about software. Scientists and engineers alike are interested in analyzing this wealth of information both for curiosity as well as for testing important research hypotheses. However, the current barrier to entry is prohibitive and the cost of such scientific experiments great. Furthermore, these experiments are often irreproducible. This talk will describe our work on the Boa language and its data-intensive infrastructure. In a nutshell, Boa aims to be for open source-related research as Mathematica is to numerical computing, R is for statistical computing, and Verilog and VHDL is for hardware description. Our evaluation shows that Boa significantly decreases the burden of the scientists and engineers analyzing human and technical aspects of open source software development allowing them to focus on the essential tasks of scientific research. This is a collaborative work with Robert Dyer, Hoan Nguyen and Tien Nguyen all at Iowa State University.
- Mining software