One purpose of QA testing is compliance assurances, including with applicable policies, industry regulations, and laws. While devs are (usually) good at functioal testing, QA (usually) does non-functional testing better. I have not known any devs that test for GDPR compliance for example. (I am certain many devs do test for that, just stating my personal experience.)
LLM oral exams can provide assessment in a student's native language. This can be very important in some scenarios!
Unlimited attempts won't work in the presented model. No matter how many cases you have, all will eventually find their way to the various cheating sites.
There is no silver bullet. There's no solution that works for all schools. Strategies that work well for M.I.T. with competitive enrollment and large budgets won't work for a small community college in an agricultural state, with large teaching loads per professor, no TAs, and about 15-25 hours of committee or other non-teaching work. That was my situation.
Teaching five courses and eight sections, 20-30 students per section, 10-20 office hours every week (and often more if the professor cared about the students), leaves little time for grading. In desperation I turned to weekly homework assignments, 4-6 programming projects, and multiple choice exams (containing code and questions about it). Not ideal by any means, just the best I could do.
So I smile now (I'm retired) when I hear about professors with several TAs each, explaining how they do assessment of 36 students at a school with competitive enrollment.
> Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.
So this change doesn't allow an empty result set, something that is no longer allowed by the new implementation but was allowed previously. Isn't that the sort of breaking change you want your regression tests to catch?
I used ResultSet because the comment above mentioned it. A clearer example of what I’m talking about might be say you replace “x.size() > 0” with “!x.isEmpty()” when x is a mocked instance of class X.
If tests (authored by someone else) break, I now have to figure out whether the breakage is due to the fact that not enough behavior was mocked or whether I have inadvertently broken something. Maybe it’s actually important that code avoid using “isEmpty”? Or do I just mock the isEmpty call and hope for the best? What if the existing mocked behavior for size() is non-trivial?
Typically you’re not dealing with something as obvious.
What is the alternative? If you write a complete implementation of an interface for test purposes, can you actually be certain that your version of x.isEmpty() behaves as the actual method? If it has not been used before, can you trust that a green test is valid without manually checking it?
When I use mocking, I try to always use real objects as return values. So if I mock a repository method, like userRepository.search(...) I would return an actual list and not a mocked object. This has worked well for me. If I actually need to test the db query itself, I use a real db
For example, one alternative is to let my IDE implement the interface (I don’t have to “write” a complete implementation), where the default implementations throw “not yet implemented” type exceptions - which clearly indicate that the omitted behavior is not a deliberate part of the test.
Any “mocked” behavior involves writing normal debuggable idiomatic Java code - no need to learn or use a weird DSL to express the behavior of a method body. And it’s far easier to diagnose what’s going on or expected while running the test - instead of the backwards mock approach where failures are typically reported in a non-local manner (test completes and you get unexpected invocation or missing invocation error - where or what should have made the invocation?).
My test implementation can evolve naturally - it’s all normal debuggable idiomatic Java.
It doesn't have to be a breaking change -- an empty result set could still be allowed. It could simply be a perf improvement that avoids calling an expensive function with an empty result set, when it is known that the function is a no-op in this case.
If it's not a breaking change, why would a unit test fail as a result, whether or not using mocks/fakes for the code not under test? Unit tests should test the contract of a unit of code. Testing implementation details is better handled with assertions, right?
If the code being mocked changes its invariants the code under test that depends on that needs to be carefully re-examined. A failing unit test will alert one to that situation.
(I'm not being snarky, I don't understand your point and I want to.)
The problem occurs when the mock is incomplete. Suppose:
1. Initially codeUnderTest() calls a dependency's dep.getFoos() method, which returns a list of Foos. This method is expensive, even if there are no Foos to return.
2. Calling the real dep.getFoos() is awkward, so we mock it for tests.
3. Someone changes codeUnderTest() to first call dep.getNumberOfFoos(), which is always quick, and subsequently call dep.getFoos() only if the first method's return value is nonzero. This speeds up the common case in which there are no Foos to process.
4. The test breaks because dep.getNumberOfFoos() has not been mocked.
You could argue that the original test creator should have defensively also mocked dep.getNumberOfFoos() -- but this quickly becomes an argument that the complete functionality of dep should be mocked.
> We actually disable NTP entirely (run it once per day or at boot) to avoid clocks jumping while recording data.
This doesn't seem right to me. NTP with default settings should be monotonic. So no jumps. If you disable it Linux enters 11-minute mode, IIRC, and that may not be monotonic.
Pedantically, a monotonic function need not have a constant first derivative. To take it further, in mathematics it is accepted for a monatomic function to have a countable number of discontinuities, but of course in the context of a digital clock that only increments in discrete steps, that’s of little bearing.
But that’s all besides the point since most sane time sync clients (regardless of protocol) generally handle small deviations (i.e. normal cases) by speeding up or slowing down the system clock, not jumping it (forward or backward).
You are correct, NTP prefers to jump first (if needed) and then slew afterwards (which is exactly what we want!), although it can jump again if the offset is too large.
In our case the jumps where because we also have PTP disciplining the same system clock, when you have both PTP and NTP fighting over the same clock, you will see jumping with the default settings.
For us it was easier to just do a one time NTP sync at the beginning/boot, and then sync the robots local network with only PTP afterwards.
> I live on a small farm with his wife, son, and two dogs.
Should you trust translations into English by someone who writes sentences like this? <joking>
In the movie When Harry Met Sally, Billy Crystal said hieroglyphics were actually a comic strip about a character named Sphinxy. Always hoped that was true.
Maybe a geekbench from yesteryear. Back in the mists of time it was apocryphally known as "eight megs and continually swapping". But I guess that's a couple of orders of magnitude out nowadays.
> Even when engineers get creative, there’s logic: a butterfly valve actually looks like butterfly wings. You can tell how the name relates to what it actually defines, and how it can be memorable.
Editor MACroS still has a logic. It isn't just random.
More to the point, what does a John Deere S7 600 do, or a 310 G-Tier, or a Z515E ZTrak? Emacs is an editor. That part is descriptive: an editor edits. The product name is not expected to describe what the product is. The general product category is what does that.
> The product name is not expected to describe what the product is.
There are some exceptions, but the agriculture machinery industry has actually gotten pretty good at making the names useful, with reasonable consistency across brands. S7 600: 600 tells that it is a class 6 combine, which is a value farmers understand as it pertains to the combine's capacity. For tractors, the John Deere 8R 230 sees 8 indicate a large row-crop frame, and 230 indicates a 230 HP engine. A New Holland T7.180 is, you guessed it, a medium row-crop frame with a 180 HP engine.
It may look like nothing to outsiders, but there is a lot of useful information encoded in there once you know what to look for.
Useful if you already know the basics of what it is. My point is that "S7 600" by itself doesn't tell you anything if you don't have some knowledge of the product already. The knowledge that it's a combine is separate. Similarly, "emacs" tells you nothing if you don't know it, but the generic term "editor" is descriptive.
Software doesn't generally encode product attributes into the name the way 230 means 230 horsepower and such, but that's because software doesn't really have things like that to put in the name in the first place. Most software doesn't have specific variants like that, and software that does is almost always differentiated on feature set rather than numbers.
Software often puts the version in the name. Which is the same as the S7 designation in the case of said combine. S7 is just a restyled S7x0 series combine, which was the successor to the S6x0 series.
It's not a perfect system. Before the S6x0 was the 9x70STS series, after the 9x60STS series, and the 9x50STS series. You can find a version number in there, albeit not a perfectly sequential one. Although that's nothing new. Windows 3.1 turned 3.11, 95, 98. iOS 17 turned 26. You get the picture.
Technically it is "combine". Originally it was known as a "combined harvester-thresher", which is maybe what you're thinking of, but that was soon shortened to "combine" and it has stuck ever since.
"Combine harvester" showed up in some places later where context was needed to figure out what "combine" means, but it was seemingly only for context. "Combined harvester-thresher harvester" is pointlessly redundant.
I'm conflicted because you're not entirely wrong (that it's not just the software industry), but the name is because the combine combines steps that used to be separate.
> The better peer reviews are also not this 'thorough' and no one expects reviewers to read or even check references.
Checking references can be useful when you are not familiar with the topic (but must review the paper anyway). In many conference proceedings that I have reviewed for, many if not most citations were redacted so as to keep the author anonymous (citations to the author's prior work or that of their colleagues).
LLMs could be used to find prior work anyway, today.
The obvious solution is for half of the hardware to run on dark energy, counteracting the heat generated by the other half. Venture capitalists, use my gofundme site to give me the millions needed to research this, thanks.
reply