I just handed in my PhD in computer science. Our department teaches "best practices" but adherence to them is hardly possible in research:
1) Requirements change constantly, since... it's research. We don't know where exactly we're going and what problems we encounter.
2) Buying faster hardware is usually an option.
3) Time spent on documentation, optimization or anything else that does not directly lead to results is directly detrimental to your progress. The published paper counts, nothing else. If a reviewer ask about reproducibility, just add a git repository link.
4) Most PhD students never worked in industry, and directly come from the Master's to the PhD. Hence there is no place where they'd encounter the need to create scalable systems.
I guess Nr. 3 is has the worst impact. I would love to improve my project w.r.t. stability and reusability, but I would shoot myself into the foot: It's no publishable, I can't mention it a lot in my thesis, and the professorship doesn't check.
Putting some effort into (3) can increase your citations (h-index). If people can’t use your software then they will just find some other method to benchmark against or build on.
Here you are not improving your time to get out an article, but reducing it for others - which will make your work more influential.
> 3) Time spent on documentation, optimization or anything else that does not directly lead to results is directly detrimental to your progress.
Here's is where I disagree. It's detrimental in the short term, but to ensure reproducibility and development speed in the future you need to follow best practices. Good science requires good engineering practices.
The point is, it's not prioritized since it's not rewarded. Grad students are incentivized to get their publications in and move on, not generate long-term stable engineering platforms for future generations.
An experimental research system does not have to be a complete practical system, it can focus on a few things to prove a point, support a scientific claim.
Indeed. It doesn't have to consistently work, be easy to modify, be efficient, be well documented, etc., and in general usually won't be since there is no reward for any of these. It just has to "prove a point" (read: provide sufficient support for the next published paper, with paper reviewers caring far more about the paper's text than any associated code or documentation).
Anyone who spends lots of time trying to make research-relevant code projects with solid architecture / a well designed API / tests / good documentation / etc. is doing it as a labor of love, with the extra work as volunteer effort. Very occasionally a particularly enlightened research group will devote grant money to directly funding this kind of work, but unfortunately academia by and large hasn't found a well organized way to support these (extremely valuable) contributions, and lots of these projects languish, or are never started, due to lack of support.
What's your point? In practice, people doing work on solid research infrastructure code don't get social or financial support, don't get tenure, often can't keep academic jobs, and typically end up giving up and switching to (highly paid and better respected) industry work. Sometimes that code ends up supported as someone's part-time hobby project. If you hunt around you can find this discussed repeatedly, sometimes bitterly. Some of the most important infrastructure projects end up abandoned, with no maintainers.
In practice, most research code (including supposedly reusable components) ends up getting written in a slipshod ad-hoc way by grad students with high turnover. It typically has poor choice of basic abstractions, poor documentation, limited testing, regressions from version to version, etc. Researchers make do with what they can, and mainly focus on their written (journal paper) output rather than the quality or project health of the code.
Never had a paper rejected for lack of reproducibility though. And as long as I am working for the PhD and not the long term career, it's still better to focus on the short term. I don't like it, but I feel that's where I ended up :(
1) Requirements change constantly, since... it's research. We don't know where exactly we're going and what problems we encounter.
2) Buying faster hardware is usually an option.
3) Time spent on documentation, optimization or anything else that does not directly lead to results is directly detrimental to your progress. The published paper counts, nothing else. If a reviewer ask about reproducibility, just add a git repository link.
4) Most PhD students never worked in industry, and directly come from the Master's to the PhD. Hence there is no place where they'd encounter the need to create scalable systems.
I guess Nr. 3 is has the worst impact. I would love to improve my project w.r.t. stability and reusability, but I would shoot myself into the foot: It's no publishable, I can't mention it a lot in my thesis, and the professorship doesn't check.