tripple6's comments

tripple6 · 2025-12-28T21:29:14 1766957354

Mockito uses declarative matching style of specifying what should be mocked. You don't need to implement or even stub all of interface methods since Mockito can do it itself. It may be extremely concise. For example, interfaces may have tens methods or even more, but only one method is needed (say, java.sql.ResultSet). And finally probably the most important thing, interaction with mocks is recorded and then can be verified if certain methods were invoked with certain arguments.

derriz · 2025-12-28T23:59:25 1766966365

That’s the seductive power of mocking - you get a test up and running quickly. The benefit to the initial test writer is significant.

The cost is the pain - sometimes nightmarish - for other contributors to the code base since tests depending on mocking are far more brittle.

Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.

Working on a 10+ year old code base, making a small simple safe change and then seeing a bunch of unit tests fail, my reaction is always “please let the failing tests not rely on mocks”.

wpollock · 2025-12-29T02:55:39 1766976939

> Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.

So this change doesn't allow an empty result set, something that is no longer allowed by the new implementation but was allowed previously. Isn't that the sort of breaking change you want your regression tests to catch?

derriz · 2025-12-29T07:29:11 1766993351

I used ResultSet because the comment above mentioned it. A clearer example of what I’m talking about might be say you replace “x.size() > 0” with “!x.isEmpty()” when x is a mocked instance of class X.

If tests (authored by someone else) break, I now have to figure out whether the breakage is due to the fact that not enough behavior was mocked or whether I have inadvertently broken something. Maybe it’s actually important that code avoid using “isEmpty”? Or do I just mock the isEmpty call and hope for the best? What if the existing mocked behavior for size() is non-trivial?

Typically you’re not dealing with something as obvious.

throwaway7375 · 2025-12-29T09:39:14 1767001154

What is the alternative? If you write a complete implementation of an interface for test purposes, can you actually be certain that your version of x.isEmpty() behaves as the actual method? If it has not been used before, can you trust that a green test is valid without manually checking it?

When I use mocking, I try to always use real objects as return values. So if I mock a repository method, like userRepository.search(...) I would return an actual list and not a mocked object. This has worked well for me. If I actually need to test the db query itself, I use a real db

derriz · 2025-12-29T11:04:00 1767006240

The alternative to what? Using mocks?

For example, one alternative is to let my IDE implement the interface (I don’t have to “write” a complete implementation), where the default implementations throw “not yet implemented” type exceptions - which clearly indicate that the omitted behavior is not a deliberate part of the test.

Any “mocked” behavior involves writing normal debuggable idiomatic Java code - no need to learn or use a weird DSL to express the behavior of a method body. And it’s far easier to diagnose what’s going on or expected while running the test - instead of the backwards mock approach where failures are typically reported in a non-local manner (test completes and you get unexpected invocation or missing invocation error - where or what should have made the invocation?).

My test implementation can evolve naturally - it’s all normal debuggable idiomatic Java.

akoboldfrying · 2025-12-29T03:30:58 1766979058

It doesn't have to be a breaking change -- an empty result set could still be allowed. It could simply be a perf improvement that avoids calling an expensive function with an empty result set, when it is known that the function is a no-op in this case.

wpollock · 2025-12-29T03:41:26 1766979686

If it's not a breaking change, why would a unit test fail as a result, whether or not using mocks/fakes for the code not under test? Unit tests should test the contract of a unit of code. Testing implementation details is better handled with assertions, right?

If the code being mocked changes its invariants the code under test that depends on that needs to be carefully re-examined. A failing unit test will alert one to that situation.

(I'm not being snarky, I don't understand your point and I want to.)

akoboldfrying · 2025-12-29T11:36:01 1767008161

The problem occurs when the mock is incomplete. Suppose:

1. Initially codeUnderTest() calls a dependency's dep.getFoos() method, which returns a list of Foos. This method is expensive, even if there are no Foos to return.

2. Calling the real dep.getFoos() is awkward, so we mock it for tests.

3. Someone changes codeUnderTest() to first call dep.getNumberOfFoos(), which is always quick, and subsequently call dep.getFoos() only if the first method's return value is nonzero. This speeds up the common case in which there are no Foos to process.

4. The test breaks because dep.getNumberOfFoos() has not been mocked.

You could argue that the original test creator should have defensively also mocked dep.getNumberOfFoos() -- but this quickly becomes an argument that the complete functionality of dep should be mocked.

tripple6 · 2025-12-29T09:04:31 1766999071

Jumping ahead to the comments below: obviously, I mentioned `java.sql.ResultSet` only as an example of an extremely massive interface. But if someone starts building theories based on what is left unsaid in the example for those from outside the Java world, one could, for instance, assume that such brittle tests are simply poorly written, or that they fail to mitigate Mockito's default behavior.

In my view, one of the biggest mistakes when working with Mockito is relying on answers that return default values even when a method call has not been explicitly described, treating this as some kind of "default implementation". Instead, I prefer to explicitly forbid such behavior by throwing an `AssertionError` from the default answer. Then, if we really take "one method" literally, I explicitly state that `next()` must return `false`, clearly declaring my intent that I have implemented tests based on exactly this described behavior, which in practice most often boils down to a fluent-style list of explicitly expected interactions. Recording interactions is also critically important.

How many methods does `ResultSet` have today? 150? 200? As a Mockito user, I don't care.

tripple6 · 2025-02-12T08:17:31 1739348251

I've seen several posts on jj, but could anyone please tell what can't be done by git, or what is harder in git but super-easy in jj by providing the sequence of git commands and jj commands for comparison?

bjackman · 2025-02-12T14:39:46 1739371186

You can do everything in Git really. I don't see anything in jj that Git literally can't do, but as someone who spends a lot of time faffing around rebasing huge branches I can really see the appeal of something that does all the same stuff better.

If I mostly worked on a small-to-medium size project with close connections between developers, where mostly you just get your code merged pretty quickly or drop it, then I wouldn't see any value in it. But for Linux kernel work git can often be pretty tiresome, even if it never actually gets in the way.

I thought that nothing could beat Git until I tried Fig (Google's Mercurial thing). It ends up being awful coz it's so bloody slow, but it convinced me that a more advanced model of the history can potentially life easier more often than it makes it harder.

Fig's model differs from Git in totally different ways than jj's does but just abstractly it showed me that VCS could be meaningfully improved.

steveklabnik · 2025-02-12T17:05:46 1739379946

Yes, at the end of the day, there's nothing you can't do in git that you can do in jj. This is easy to demonstrate, since jj has a git backend. There are minor things about that that do show some things, like for example, change IDs are totally local, and not shared, since git doesn't have the notion of a change ID, but that's not what we're talking about really.

At the end of the day, every DVCS is ultimately "here is a repository that is a bunch of snapshots of your working directory and a graph of those snapshots" plus tools to work with the graph, including tools to speak to other repositories.

From any given snapshot A -> B, both git and jj can get you there. The question is, which tools are more effective for getting the work done that you want to do?

martinvonz · 2025-02-12T08:23:08 1739348588

To move the changes in file `foo` in the working copy into a past commit `X`:

`git commit --fixup=X foo; git stash; git rebase -i X^; git stash pop`

`jj squash --into X foo`

Aissen · 2025-02-12T11:03:37 1739358217

You can simplify this:

git commit --fixup X ; git rebase --interactive --autostash --autosquash X^

If you do that often, an alias might help; I have one for the second command above. You might want to look at git-fixup or git-absorb for automatically finding the "X" commit.

Aside: I really ought to try jj, it looks very promising.

steveklabnik · 2025-02-12T16:16:42 1739377002

jj has jj absorb already.

Aissen · 2025-02-12T16:52:01 1739379121

I heard it worked even better than git-absorb, is that true?

steveklabnik · 2025-02-12T16:58:19 1739379499

I have not used either, so I cannot answer that.

Martinussen · 2025-02-12T10:57:31 1739357851

That looks more like a git alias than a job for an entirely new tool, to me. How many of the core functions do you really need to cover before `jj` itself becomes redundant?

martinvonz · 2025-02-12T22:38:40 1739399920

I apologize if my sibling comment sounded harsh. I think you were saying that jj could be implemented as some Git aliases. Given the information available in this thread, that might seem reasonable. I didn't realize that this thread did not include a link to the project's docs. Sorry about that.

martinvonz · 2025-02-12T12:22:26 1739362946

I think you misunderstood. Did you see the list of features? My example is not the only thing jj does.

tripple6 · 2025-02-05T13:53:46 1738763626

I would never work with hg anymore since and I consider git much and much more flexible from both user and scripting perspective. Yes, git also suffers of command inconsistency, and unfortunately it seems to be never fixed.

> You can UNDO. Everything. That is the major thing

Everything is what? The things git cannot undo is removing untracked files (well they're untracked) on git-clean, or files added to the index for the first time and then reset. Maybe rebase in mid-rebase is a way to lose some changes so one has re-rebase again (it's really annoying, but reflog holds all rebase changes if I recall). I can't really see what you mean.

> You can switch from one `branch` to `another` and leave things incomplete, even conflicts, that is nice.

It's nice. I use git-worktree for multiple branches I work on, so my working copies may remain dirty even in conflict stage.

> The command made sense

It's nice.

> Nobody else knows you use `jj`, so no barrier to adoption.

It's really nice, but I'm not sure whether I understand it, but does it work as a front-end tool over git/other VCS?

> Rebases are not painful anymore. Stacking prs are finally nice to author, even of NOBODY ELSE KNOW IT!

I don't get it. What does make rebase hard? It's just re-applying a bunch of patches or possibly merges on top of the new base, and it doesn't even require interactive mode. Interactive mode makes magic I'm happy with. Seriously, what's wrong with it?

cube2222 · 2025-02-05T15:12:46 1738768366

Generally, most things jj does can be done in Git, it's just much more pleasant / seamless / consistent in jj - maybe with the exception of the "Working on Two Things at the Same Time" pattern[0], that one might be really hard to achieve in Git.

> but does it work as a front-end tool over git/other VCS?

citing from the article

> Before we dive in, one last thing you should take note of, is that most people use jj with its Git backend. You can use jj with your existing Git repos and reap its benefits in a way that is completely transparent to others you’re collaborating with. Effectively, you can treat it like a Git frontend.

> I don't get it. What does make rebase hard?

It's hard when you work on a set of stacked PRs and make frequent changes to arbitrary PRs in that stack, because every time you make a change, you have to manually rebase and push all of the other PRs. There's SaaS's specifically built and used for solving this problem in Git[1].

Maybe hard is the wrong word, it's just annoying and tedious. jj makes this completely seamless, as shown in the article[2].

[0]: https://kubamartin.com/posts/introduction-to-the-jujutsu-vcs...

[1]: https://graphite.dev/

[2]: https://kubamartin.com/posts/introduction-to-the-jujutsu-vcs...

tripple6 · 2025-02-05T16:04:20 1738771460

Regarding the rebase thing, I guess it's more a matter of habit. Stacked branches indeed may be tedious and may be annoying to rebase, agree. I implemented a shell script to rebase branch trees on top of a new base that would cover stacked branches as well. It covered all my needs like resolving conflicts, concluding or aborting the branch tree rebase and so on. Of course, it would be nice to have such a right-done thing right in git. But as of now, if I understand what jj is, it seems like I can shell-script some of its features myself. Still a happy git user.

martinvonz · 2025-02-05T15:59:02 1738771142

I think the best way to answer most of your questions is just for you to try it yourself (https://jj-vcs.github.io/jj/latest/install-and-setup/).

> Everything is what?

Not interactions affecting remotes (you can't unpush a commit). Configuration is also not tracked. Neither are ignored files.

You can undo changes in the working copy, rebases, splits, squashes, etc.

> The things git cannot undo is removing untracked files (well they're untracked) on git-clean, or files added to the index for the first time and then reset.

Files added to the index can be recovered at least, but it's not easy to find them. The bigger problem is if you have not added them to the index and run e.g. `git reset --hard`. You can't undo that (via Git, but maybe via your IDE or your file system).

It's also about ease of use. I know you can undo a lot of things by using Git's reflogs, but it's not easy to e.g. undo a `git rebase --update-refs` that update many branches.

> I use git-worktree for multiple branches I work on, so my working copies may remain dirty even in conflict stage.

Yes, Jujutsu also supports that (`jj workspace`), but IMO, it's nice to not have to use them just because your working copy is dirty.

> What does make rebase hard? It's just re-applying a bunch of patches or possibly merges on top of the new base, and it doesn't even require interactive mode.

I've worked on the `git rebase` itself, so it's not like I don't know how to use it. Some things that I think are wrong with it:

* Without `--update-refs`, it only rebases a single branch. That's basically never what I want when I have branches pointing to other rebased commits.

* It's not able to rebase more than one head (no trees of commits)

* Doesn't rebase merge commits properly. Instead, it redoes the merge, discarding any conflict resolutions. (Yes, I know about rerere.)

> Interactive mode makes magic I'm happy with. Seriously, what's wrong with it?

Mostly that it's so stateful. While doing an interactive rebase (e.g. editing a commit), you're in a state where you can't easily check out another commit, for example. If you run into conflicts, you have to finish resolving all conflicts in the stack of commits, or you have to abort the whole rebase (throwing away any previous conflict resolutions).

Again, it will be easier to understand if you just try using Jujutsu for a few days.

tripple6 · on Nov 17, 2024

Having a massive major feature done as a single commit is evil. Merging two branches may conclude combining a unit of work, a major feature, a minor feature with the main branch (of course once the topic branch is merged to the upstream, and never vice versa [rebase in git terminology]). This is logically "a big commit" constructed from a concrete amount of small commits. Additionally, having small atomic commits also makes reverting a commit a trivial operation regardless the branch the commit was introduced in. Bisecting a range of small commits also makes finding a bad commit easier.

tripple6 · on Nov 4, 2024

Sorry but I would never use this format for both manual or programmatic approach.

* I've tried to read the data this format describes without reading its documentation and I just failed: the format is amazingly counter-intuitive. I never had a readability and understanding issues with XML/HTML, JSON or even YAML (that I think is overly complicated) when I saw them for the first time.

* Terse does not mean cryptic. Basic notation is just weird: why would it need unbalanced the less-than symbol to open the array? Why `<&>` for delimiting elements? Why `<<$>` but not `<$>>` at least just to be more readable by human and look balanced? The syntax goes more weird for arrays containing objects: indents (okay to some extent), `<>` and `<&>` (`{` and `}`?).

* Auto-removing whitespaces may hurt. If the format offers this, would it also offer a heredoc-style text like `cat <<EOF` in Bash so that the formatting could be preserved as is? `xml:space` and JSON string literals were designed exactly for this. (upd: I just saw new symbol: `|`... Well, okay, but another special character now.)

* Native support for arrays. I mentioned a few above. `<<Faults$$>` and `<<$$>` -- guess what these two mean if you see this first time? You would never guess. It's an empty array and an empty element, you've just failed.

* Graphs... Another weird syntax comes into the room: `#id;` but `@id` (no semicolon?). Okay, these seem to be first-class ids and refs, not necessarily designed for graphs (I'm not sure if the `#ID;` and `@` would play perfect with any non-empty names.) But what does graphs make first-class citizens here and why? Graphs can be expressed, I believe, in any data/markup format/language and then processed with a particular application if graphs are needed. By the way, arrays and objects are not necessarily trees from the semantic point of view. More graph processing issues were mentioned in other comments to this topic. What about the first-class support for sets? I'm kidding

* Comments. Another symbol here to come: `%`. To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose. What if the comments would start with a well-known `#` at least with a space right after it so that it wouldn't be considered a "graph id" (or, don't get me wrong, with another `<`/`$` sequence)

* Just got to the Escaping section and now I see how the characters are escaped. Perhaps this is okay.

* Scalars. Crazy number formatting and locale issues are waiting. The never-on-keyboard infinity symbol would be great for APL, but why not just Inf(inity)? Whatever the scalar value is, no need to cover all existing primitive scalars -- just let them be processed by an application since all scalars are text semantically. Another crazy things: what does make UUIDs that special for this format?; why does make Base64 that special so that it has native support (would it support Base16 for human-readable message digests; or Base58 to remove visually lookalike Base64 characters)?

* CR/LF? I can understand its semantic purpose, but why not LF to make it even more "blazingly" fast? Say good-bye to UNIX users.

* The cognitive load for the markup syntax absolutely does not make it efficient in typing. Believe me, it does not.

What I would do, I would probably enhance the widely used formats, say make JSON, which I find almost perfect from the syntax point of view, not require quotes for object property names if the names would not contain special characters like `:` just like it goes in JavaScript. And perhaps make XML "v2" move away from SGML hence loosening its syntax to get rid of the closing tags with shorter notation, first-class array support and fixing syntax issues especially for CDATA and comments that can't support `--`. You would blame me, but I love XML the most: it just has the richest set of standardized amazing well-designed extensions to operate XML with regardless the heavy XML syntax.

P.S. How does it look like in the document it marks up is minified (e.g., no whitespaces)?

GeneThomas · on Nov 5, 2024

> `<<$>` but not `<$>>`

Having both begin and terminate arrays start with << is more consistent.

> `<>` and `<&>` (`{` and `}`?).

Using `{` and `}` would lead to more special characters.

> Auto-removing whitespaces may hurt.

It does not.

> Graphs... But what does graphs make first-class citizens here and why?

It is simpler to support graphs in the markup. The fact is that the data being serialized may be structured in a graph.

> CR/LF

It supports LF only ᴜɴɪx line ends as well as CR/LF internet line endings.

> Comments [...] To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose

LaTex and PostScript both use % for comments. # matches the usage in ᴄꜱꜱ and ʜᴛᴍʟ, relating to an id/page location.

> What if the comments would start with a well-known `#` at least with a space right after it so that it wouldn't be considered a "graph id"

Having a space after the # differentiate between and id and comment would be a mistake.

> Scalars. Crazy [...] UUIDs

The Formats section is to facilitate interoperability between implementations, e.g. if you are encoding a ɢᴜɪᴅ [easy to say] then format it this way.

> not make it efficient in typing.

It is more terse than ᴊꜱᴏɴ.

> XML "v2" ... first-class array support

Xᴇɴᴏɴ has first class array support, the xᴍʟ like syntax leads to the <empty-arrray$$> notation.

> P.S. How does it look like in the document it marks up is minified (e.g., no whitespaces)?

Good.

tripple6 · on Nov 5, 2024

I love this.

> Having both begin and terminate arrays start with << is more consistent.

It hides context for humans. I am a human and I love to see what opens and what closes the context. Why would `<` open an array if `[` is astonishingly wide-spread practice? Why would `<<` close it just because you think it is more consistent? What if open/close balance is also consistency, especially for nested arrays?

Also just think how many key strokes you'd save if you'd use `]` instead of [Shift]+`,` [Shift]+`,` [Shift]+`4` [Shift]+`.` if you declare it as readable text.

> Using `{` and `}` would lead to more special characters.

Agree. Too many now.

> It is simpler to support graphs in the markup. The fact is that the data being serialized may be structured in a graph.

I can't understand why you call it native graph support. The only thing it does is declaring an identified element and references to the element. I can't see how different is that comparing to XML or JSON that semantically "have graph support" just because they also can declare something considered ids and references to the identified element.

> LaTex and PostScript both use % for comments.

Yes, just learnt that from your comment and https://news.ycombinator.com/item?id=42047634 by zzo38computer. Thank you.

> # matches the usage in ᴄꜱꜱ and ʜᴛᴍʟ, relating to an id/page location.

No. The # symbol is overloaded: it may be a comment start, especially for line-oriented and human-readable text formats or scripts; CSS uses it for IDs; HTML has nothing to do with it since browsers only use # as a part of a URL to reference a particular identified element for navigation purposes only (it's called anchor in URL syntax; formerly web-browsers used <a name="anchor"> to navigate to a part of the page; as of now in the HTML5 world any `id` attribute is considered an anchor which I find a design flaw since ids are something to be used to identify hence any id from the document is exposed for navigation navigation purposes, but <a name="anchor"> is semantically something for navigation).

> Having a space after the # differentiate between and id and comment would be a mistake.

Of course it would in its current perspective if the id declaration is `#`. Don't know what `#<NON_WHITESPACE_CHAR>` would do if it's legal.

> The Formats section is to facilitate interoperability between implementations, e.g. if you are encoding a ɢᴜɪᴅ [easy to say] then format it this way.

I agree that it may look better for consistency purposes, but what interoperability is all that about? Why would formatting even affect it? From the consumer application point of view, it must be handled from its context defined by its purpose and semantic type. If my element/attribute is formally declared as a GUID, then why would I care that much if it's conventionally formatted? Would it be still a GUID if I encode it using Base64? The dashes in GUIDs are for humans only and they are optional, and the application knows it's a GUID to process it even leniently if it can. The same goes for ISBN/ISSN for books and magazines, card numbers, phone numbers, etc -- none of them require dashes or spaces or parentheses to be processed.

This is why "Real numbers *should be stored* with commas for readability." is just hilarious. Why should? May I use underscores or dots or spaces to group digits (seriously, why comma)? Can I group digits after the period? If I need integers, why are they also limited to 32 bits and 64 bits? How would I present an arbitrary precision integer or non-integer number (say, I want the Pi number 197 digits after the 3)? If ∞ is allowed, but no mention on +Inf and -Inf, can be 4.2957×10^24 used instead of 4.2957e24? May I just have simple `D+(\.D+)?` for everything I need for true interoperability?

I agree consistent formatting is really beautiful, but it must never be the key to process data.

> It is more terse than ᴊꜱᴏɴ.

Sorry, it's not.

> Good

Could you please provide an example of minified (a single line, no new lines) array of timestamps from your page?

UPD: I've just seen https://news.ycombinator.com/item?id=42038508 by Oras . Well, you know.

----

In short, too many whys, weird syntax and design decisions, so I cannot see anything that makes it a "better alternative" to XML, JSON, or YAML.

GeneThomas · on Nov 6, 2024

I don’t love this.

> I can't see how different is that comparing to XML or JSON that semantically "have graph support" just because they also can declare something considered ids and references to the identified element.

When serialising data with ᴊꜱᴏɴ one has to use special field names such as $id; hoping the programming language does not. It DOES have native graph support that xᴍʟ and ᴊꜱᴏɴ do not.

> # [..] it may be a comment start

No.

> but what interoperability is all that about?

Interoperability between implementations. If you were using Xᴇɴᴏɴ to communicate between two different languages, say the C# and a Python implementation, agreeing of what an integer IS is helpful. Both Xᴇɴᴏɴ libraries can provide support for encoding say ɢᴜɪᴅs. You have missed the point. A user is always free to encode data as arbitrary strings.

> commas [...] readability." is just hilarious. Why should?

Commas makes numbers faster to interpret. Something `ls` is missing. As I stated on another branch English is the global lingua franca so commas every three digits is the standard.

> ∞ is allowed, but no mention on +Inf and -Inf, can be 4.2957×10^24 ∞ is +Inf. 4.2957×10^24 is not the xᴇɴᴏɴ standard.

>> It is more terse than ᴊꜱᴏɴ.

>Sorry, it's not.

See https://news.ycombinator.com/item?id=42049033

<<Timestamps>2026-09-24T16\:45\:22.5383742<&>2026-10-04T18\:25\:12Z<&>2026-04-02<$>>

Better than ᴊꜱᴏɴ which does not do timestamps.

tripple6 · on Nov 6, 2024

> When serialising data with ᴊꜱᴏɴ one has to use special field names such as $id; hoping the programming language does not.

Unless a serialization/deserialization tool supports property name overriding which is trivial.

> It DOES have native graph support that xᴍʟ and ᴊꜱᴏɴ do not.

Again, how is this different from `xml:id` that is referenced from other XML document nodes and what makes it "native graph support"?

> Both Xᴇɴᴏɴ libraries can provide support for encoding say ɢᴜɪᴅs. > Better than ᴊꜱᴏɴ which does not do timestamps.

Better?

There is just no need. For what? These two can be controlled by optional schemas that may be extensible like types to validate in XML Schema or Relax NG. Schemas do not dictate format and you don't need your format to be a schema. I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

I tend I think JSON also has a design flaw providing first-class support for booleans and numbers in terms of literals it took from JavaScript because the latter needs more complex syntax as a programming language. Ridiculously, XML seems to be perfect in this case unifying scalar values: whatever scalar it encodes, text representation can encode it in any efficient format regardless it is a boolean, number (integer, "real", complex, whatever special), a "human-text" string, timestamp or whatever else; HTML attribute values unlike XML don't even need to be quoted in some trivial cases and even may be omitted for boolean attributes. The application simply parses/decodes its data and manages how the data is deserialized. That's all it needs.

I would probably be happy if, say, there would be a format as simple/minimalistic as possible not even requiring delimiters like or quoted strings unless they are ambiguous. Say, `[foo 'bar baz' foo\ bar Zm9vYmFyCg== 2.415e10 ∞ +Inf -∞ -Infinity \[qux\] +1\ 123\ 456789 978012345678 {k1 v1 k2 v2} aa512e8ecf97445eac10cb5a5ea3ef63 c8a0ebbd 2026-09-24T16:45:22.5383742 P3Y6M4DT12H30M5S]` or similar, maybe with nodes metadata and comments support. The above dumb format covers arrays/lists/sets, strings `foo`, space-containing `bar baz`, `foo bar` strings in human and Base64 encoding, the `2.415e10` number from your document and both four infinity notations, a single string `[qux]` and not a nested array with a single element, a phone number (with space delimited country code, region code and local number), an ISBN, a simple map/object made of two pairs, a GUID, a CRC32 checksum, an ISO-8601 zoned date/time, and an ISO-8601 duration. What more scalar types it can be extended with? Since there is no type for scalars in this "format" does not dictate types or preferred scalar formats letting the application make decision how to interpret these on its own.

> Commas makes numbers faster to interpret. Something `ls` is missing. As I stated on another branch English is the global lingua franca so commas every three digits is the standard.

For whom? Humans? Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country? You've been told that in that thread too, especially if spaces or underscores are even more readable for monospace fonts. You don't need it.

> See https://news.ycombinator.com/item?id=42049033

Funny enough -- your format saves on key/value pairs syntax appealing to 4 vs 6 overhead (okay, cool), but your array elements delimited with `<&>`, and amazingly bad at keyboard typing ergonomics, loses to simple and regular JSON `,` syntax (3 vs 1 overhead). Isn't it blind or crazy?

GeneThomas · on Nov 7, 2024

> Again, how is this different from `xml:id`

It is a tidier solution.

> I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

They are common in data.

> [...] boolean attributes

Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

> What more scalar types it can be extended with?

> letting the application make decision how to interpret these on its own.

That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

>For whom? Humans?

Yes. Human have to read markup.

> Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country?

I repeat! READABILITY.

> Isn't it blind or crazy?

No, quite the opposite.

tripple6 · on Nov 7, 2024

> It is a tidier solution.

Based on special syntax. You're about to introduce node attributes.

> They are common in data.

I use tables everyday. May I have "first-class graph support" but for tabular data that is very common as well? I expected three or four times you eventually explain what makes the graph support and how it differs from declaring ids and refs in other formats you think are worse than yours. No answer.

> Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

For the first, I kind of agree that attributes and subnodes should be unified in favor of subnodes (which was sacrificed for markups like HTML for sane brevity sake). However attributes, your ids are, may be metadata for nodes of any kind. For the second, API for what? Document generating/parsing API? Validation API? Serialization/deserialization API? Enveloped application API? I guess, the latter for whatever reason dictated in your "standard" . In any case documentation, schemas, data validators and autocompletes are my best friends, no need to "guess".

> That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

What you're mentioning is called serialization and deserialization, and these two be easily implemented once for "basic" types and extended at the application level for any kind of data, because an application decides what to do with data on its own, not the format the data is enveloped in. Serialization and deserialization don't exist from the format perspective which only defines the syntax way data is marked up in a document. So why would it care the formatting at all?

> Yes. Human have to read markup.

Format should not care too much.

> I repeat! READABILITY.

No yelling please. Regional formats are defined by countries, not languages you said elsewhere, just by definition, even if English is the lingua franca. Separate digits with underscores or spaces.

I'm very happy your "standard" neither recommend color highlighting for, say, numbers, nor even worse has special syntax for readability highlighting. Highlighting increases readability greatly as well, you know.

> No, quite the opposite.

6:4 but 1:3 is a great syntax win. Okay.

No any solid counter arguments from your side being blind for obvious design flaws of your so-called format "standard" only tells how you mixed up all concepts in a mess of crazy syntax markup, and scalar object formatting for scalars that only must be handled by applications while serialization and deserialization regardless the markup format "standard" recommends.

Good luck with your "standard" rightly criticized and rejected by others, but better just bury it not spending your life for nothing. Sincerely.

GeneThomas · on Nov 7, 2024

> You're about to introduce node attributes.

Yes, but limited to #id and :type.

> tabular data that is very common as well?

Xᴇɴᴏɴ has first class arrays also so tabular data could be stored as such.

> explain what makes the graph support and how it differs from declaring ids and refs in other formats you think are worse than yours. No answer.

It is built in!

> So why would it care the formatting at all?

FOR INTEROPERABILITY! That is different implementations of xᴇɴᴏɴ agree on what a ɢᴜɪᴅ or date looks like! Fʏɪ, with a good implementation of xᴇɴᴏɴ you just point the library at your data, sometimes augmented with some attributes, and you get cleanly formatted markup.

>>One should be able to guess an ᴀᴘɪ.

> For the second, API for what?

Say you are using an ᴀᴘɪ for information about a person and their is information about their height, in xᴇɴᴏɴ one knows there shall be a scalar called “Height”, in xᴍʟ it may be an attribute or a sub element.

>> Yes. Human have to read markup.

> Format should not care too much.

We are using text formats because they are READable to humans.

> Separate digits with underscores or spaces.

That is not standard anywhere.

> [...] color highlighting

Only the application knows if a scalar is a number or a string.

There are no obvious design flaws. Take xᴍʟ, add an array type and xᴇɴᴏɴ results.

We must be talking a cross purposes re formatting. [phew...] An application has an object called Person, and a field called Height with a type of double. C♯: Person fred = new Person { Height = 1.67 }; string xenon = XenonStart.Serialize("person", fred), results in the string "<person><Height=1.67><$>". A xᴇɴᴏɴ implementation in another language, say JavaScript can take that xᴇɴᴏɴ string and decode it into an object with a field called Height with a value that can be decoded .AsNumber into 1.67; because there is a standard for encoding a ɪᴇᴇᴇ 64 bit number/.net double/JavaScript number.

Xᴇɴᴏɴ has more benefits.

GeneThomas · on Nov 11, 2024

> * Native support for arrays. I mentioned a few above. `<<Faults$$>>` and `<<$$>>` -- guess what these two mean if you see this first time? You would never guess. It's an empty array and an empty element, you've just failed.

<< means it relates to starting an array, $>> means it is the end, $$ meaning something else — an empty array!

The xᴍʟ alternative is a bodge:

    public class PurchaseOrder
    {
        public Item[] ItemsOrders;
    }

    public class Item
    {
        public string ItemID;
        public decimal ItemPrice;
    }

serializes to:

    <PurchaseOrder>
        <ItemsOrders>
            <Item>
                <ItemID>aaa111</ItemID>
                <ItemPrice>34.22</ItemPrice>
            </Item>
            <Item>
                <ItemID>bbb222</ItemID>
                <ItemPrice>2.89</ItemPrice>
            </Item> 
        </ItemsOrders>
    </PurchaseOrder>

Where the array is marked up as two sub elements both called <Item>:

Xᴇɴᴏɴ has first class support for arrays:

  <PurchaseOrder>
      <<ItemsOrders>
          <ItemID=aaa111>
          <ItemPrice=34.22>
      <&>
          <ItemID=bbb222>
          <ItemPrice=2.89>
      <$>>
  <$>

The elements may be scalars so

  <PurchaseOrder>
      <<ItemsOrders>
      <$>>
  <$>

has an array with one item of the empty string. So a separate syntax for empty arrays is required!

  <PurchaseOrder>
      <<ItemsOrders$$>>
  <$>

zzo38computer · on Nov 5, 2024

> Comments. Another symbol here to come: `%`. To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose.

PostScript is one programming language that uses a percentage sign for comments. TeX and METAFONT also use a percentage sign for comments. There are others, too.

tripple6 · on Nov 4, 2024

Uhm... These scripts seem to be over-engineered if used in scripting, but let me review it. ':)

`git-amend`: A simple `git-commit` alias would support autocomplete and handle all `git-commit` options. Additionally, I would not add the `--no-edit` option by default as editing the commit message would be fine and quickly aborted (say in `nano`), but would introduce another `amend-no-edit` alias as it would be supported in autocomplete anyway.

`git-delete-gone-branches`: This seems to be handy for me. Might be a neat picking, but I would avoid using `awk` in this case in favor of `while IFS=$'\t' read -r ref_name marker ... < <(git for-each-ref --format='%(refname)%09%(upstream:track,nobracket)' ...)`. Additionally, it's a dangerous/destructive operation, it should have the `--force` option. Especially if it deletes a branch if it has commits that are not yet merged.

`git-dir`: The same: an alias is just fine. If used as a user command, sure.

`git-force-pull`: Seems to be fine. I would probably parse the list of tracked branches remotes to be passed to `git-fetch` and then process each with `git-for-each-ref` in a single loop.

`git-forward`: The script seems to execute excessive `git-pull` and `git-fetch` (I would prefer the latter unifying the commands in use), in terms of interacting with the remote repository that would probably need more visible progress verbosity to stderr, and then merely `git-merge` the current branch using the `--ff-only` option, but `git pull --ff-only` is okay too. Since bash supports arrays the script may construct multiple arguments to pass to `git-fetch` so it may fetch more in a single go (but, to be honest, I'm not sure if it would not cause the whole `git-fetch` run to fail if any of its refspec fails for whatever reason).

`git-gc-all`: I would go with an alias as it's clearly a user command, but yes, adding the `--force` would be more tricky perhaps requiring an environment variable like `FORCE` to handle the force flag (i.e. `FORCE= git-gc-all`). Not sure why the script checks whether the command runs in the git repository or a working directory. (Also, it would need the exact path for the script, otherwise it may run in as situation where it would trying to find the `git-in-repo` script in user's current directory.)

`git-in-repo`: Not sure if it's supposed to be used as a user command at all, not a scripting one, but if it's the latter case, git checks the repository directory itself if the given command works with the repository. (N.B. git quirk: deleting a remote repository ref, at least using `git pull -d ...`, requires any local git repo even if it's unrelated to the remote or does not have a remote repository registered in its remotes list...)

`git-is-branch-remote`: I can't pick of a scenario this would be handy for.

`git-is-head-detached`: Not sure if it's supposed to be used in scripts only. From the user perspective, `git-status` or configured `PS1` indicating if `HEAD` is detached would work.

`git-is-worktree-clean`: Another alias candidate if it's supposed to be a user command?

`git-legacy`: To be honest, I didn't figure out how it works and what it's supposed to do. ':)

`git-main-branch`: The script has currently the `origin` remote hardcoded. But I'm not sure if the concept of the main branch exists in git at all.

`git-mode-restore`: This script is crazy. ':) If I understand its purpose, can this be implemented using `git-diff-tree` and `git-update-index`?

`git-root`: Another alias candidate? Is Cygwin pain for certain commands? I also don't know how `cygpath` affects what the Cygwin users do.

`git-xlog`: This seems to be a good candidate as a `git-reflog`/`git-log` builtin, I guess. I have a script, similar to this one, that finds `TODO`-marks introduced (i.e. added lines only) at a specific revision using `git-diff-tree`, but yes it must be combined with `git-rev-list` to work like this one.

General stuff:

* The `USAGE` variable is unnecessarily evaluated everytime any script gets run and does not require to exist: the usage might be defined as a function to be invoked on demand and run external commands like `expand` on demand.

* The scripts may also construct arbitrary options in an array and inject the array to command, hence not requiring command duplicates with sightly another set of options.

* Some of commands from the toolset don't work when launched from the repository directory because they require to be installed first.

* Some of variables are not quoted and may cause unexpected results.

tripple6 · on Oct 9, 2024

Is there anything git-based like this having semantic web stuff support similar to Semantic MediaWiki?

tripple6 · on July 23, 2024

Totally agree. Most people DO look better with hair, not bald. Any tool that can visualize a non-bald person bald reveals how the person look may change from good to bad, and a lot of such persons are not that masculine or attractive when they're bald as they or anybody else might think. I could not cope with being bald, and the only thing that really helped me was FUE, nothing else. I'm really lucky to look younger than I might look in my 38, and I still have my hair on top of my head, not underneath being more attractive than having premature balding issues in 25-33.

tripple6 · on Nov 4, 2022

For Android I prefer Press (com.twentyfivesquares.press), a discontinued app I asked the creators to open the source code for a few years ago. Got ignored. Probably the only app known to me to categorize starred/read-later entries. Well, a couple of bugs (rare caching issues) and supporting already-dead services are not a big issue even after many years, however kind of reversing and modding it to make it support Inoreader or ttRSS would make me finally get rid of Feedly (seems still to support the original Google Reader API?) whose both web version and the app are you-know-what.