Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Jq is rounding 64-bit unsigned integers (2017) (github.com/stedolan)
23 points by codetrotter on June 1, 2021 | hide | past | favorite | 28 comments


"it's not a bug" is a really bizarre response.

"we know" is kind of ok, but the status of bug-hood is defined between coders and users, not solely by coders I think, and this breaks the POLA severely: People who depend on JQ don't expect this.


In this particular though, the spec explicitly talks about that [0]

> Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide...

[0] https://datatracker.ietf.org/doc/html/rfc7159#section-6

Which means that if you are putting 64 bit integers into JSON and require every bit to be used, you are not actually creating a JSON which is compatible with all the consumers. For example, such JSON is not compatible with browser. Here is what my Firefox's JQ console says:

    >> x = '{"id":675127116845989888,"id_str":"675127116845989888"}'
    <- "{\"id\":675127116845989888,\"id_str\":\"675127116845989888\"}"
    >> JSON.parse(x)
    <- Object { id: 675127116845989900, id_str: "675127116845989888" }

 
I'd say that JQ acting the same way as browsers is pretty reasonable, no?


Since jq does something completely different from a browser, it would be reasonable for it to try harder in some respects. A tool that is supposed to pass certain data through unchanged... Should not change that data. Even if we can expect that data to eventually be rounded by the eventual consumer at some later time.


Yea. I think so. But, I think people who are stuck in 64 bits forget about 128 bit digit strings being common now.

I am more wrong than right: if you can point to a written spec saying "be not astonished" then POLA doesn't apply. And you did.


That is the problem with JSON. XML worked much better for integers. And if it is typed with XML schema or XPath, XML has an xs:unsignedLong integer type.

I wrote my own command line XPath JSON-query tool and used bigdecimals for JSON numbers. If you do not do much math, bigdecimals are probably even faster than using floats, and much easier to implement. Converting a string to a double float is extraordinary complex. I think it is one of the most difficult tasks in computing.

Unfortunately, now the W3C made a new XPath standard that requires JSON parsers to use double for numbers, so I changed my tool to use doubles as well. Now I am struggling with the string<->float conversion. The conversion in the standard library does not work properly. I just looked at another conversion library. 4000 lines of code, and after a 2 hour investigation, it turns out, it also does not work properly.


For whatever it's worth, on a somewhat-current Linux Mint, with the test from https://github.com/stedolan/jq/issues/1387:

System jq:

    $ jq --version
    jq-1.6
    $ echo '{"number":288230376151711744}' | jq '.number'
    288230376151711740
Fresh compile from source according to the build instructions at https://github.com/stedolan/jq:

    $ ./configure --with-oniguruma=builtin && make -j8
    $ ./jq --version
    jq-1.6-137-gd18b2d0-dirty
    $ echo '{"number":288230376151711744}' | ./jq '.number'
    288230376151711744
Alternatively:

    $ ./configure --with-oniguruma=builtin --enable-decnum=no && make -j8
    $ echo '{"number":288230376151711744}' | ./jq '.number'
    288230376151711740
So the basic bug is fixed, jq has included a bignum library for > 2 years. I don't know if Mint (and thus presumably Ubuntu, and thus possibly Debian) includes an older version of jq or sets nonstandard user-unfriendly flags on purpose, but I'm somewhat underwhelmed in either case.


Ran across this today and spent a good 20 to 25 minutes trying to debug why my own code was failing when it was jq that was rounding an id in the JSON response.

Submitting this to let others that use jq beware of this :/


Thanks for submitting this interesting issue. Out of interest, what system are you on, and did jq come from a package manager? If you build it from source, it should work: https://news.ycombinator.com/item?id=27362060


macOS Big Sur on a MacBook Pro M1. Installed via the Homebrew package manager.


I've always (20+ years) known that Javascript numbers were floats, I've thus always assumed JSON numbers to be as well. Is this not the spec? I just store big integers in JSON strings.


JSON numbers are arbitrary precision, because the spec doesn't limit the precision in any way.


You are right, but as theamk pointed out in another comment, the spec does tell you to "expect no more precision or range than" IEEE754 doubles. So, don't count on arbitrary precision...

[1] https://datatracker.ietf.org/doc/html/rfc7159#section-6


Those numbers are so big, I'm curious what your use case is for them.


The comment states that they're IDs.


They could happily exist as strings in json if they're just IDs.


I've always found the directive "unless you do math on it, it's a string" to be useful when working with json.


Storing Twitter IDs, of course!


partitioning using prefixes?


I really like the idea of just treating numbers as a higher level data type instead of ints, floats, signed, unsigned, or whatever. I wish most high level languages kept floats as an implementation detail, and defaulted to decimals.

Though I don't see a way around it with a Serialization format like JSON that is meant to work across languages. I've done limited work with static languages, but from what I remember it would be a nightmare to have a variable that could be one of many types. I'm thinking overloading, or interfaces, or something. Could someone familiar with like Go or C#, or whatever explain how they would handle that?

Edit:

Writing that reminded me of something. If you are using a 64-bit unsigned integer, and you need to convert it to JSON for public use, please do not just give a object with a high and low value in hex. If you decide to do this anyway, but also ship an official Python SDK, just do the conversion in the SDK. I'm looking at you F5.

I should have never needed to write this function to read memory usage stats, but it was kind of fun figuring it out.

    def ulong64_to_int(ulong64):
        high = ulong64.get('high')
        low = ulong64.get('low')
        return int('{0:032b}{1:032b}'.format(high & 0xffffffff, low & 0xffffffff), 2)


> Could someone familiar with like Go or C#, or whatever explain how they would handle that?

Visitor pattern. A really shitty, but workable, implementation of tagged sum types.


Why not just?

def ulong64_to_int(ulong64): return (int(ulong64['high']) << 32) + int(ulong64['low'])


Honestly, because I always forget that bit shifting exists in Python. I did originally have it as a lambda, but broke out the variables when I made it a normal function while explaining how it worked to someone who needed it.

Though the point I was trying to make is that if you're going to be sharing data, you should serialize it in a way that works for multiple languages, probably a string in this case. Or at the very least, if you're going to provide a client library for a language, you should make that library present the data in a way that makes sense for that language.


Also note that there is another response with a bit more details in it in a different (not closed) ticket from 2018:

https://github.com/stedolan/jq/issues/1741#issuecomment-4306...


Even processors don't always actually have the capability of doing math on 64 bit numbers, in general, because they're nonsensically large.


That’s… not true at all. You’re telling me that x86-64 and AArch64 don’t work on 64 bit words natively (despite being 64 bit architectures)?

In fact, x86-64 has 256 bit data paths internally in some places.


Check this out:

https://superuser.com/questions/168114/how-much-memory-can-a...

There’s no reason to actually support 64 bit lines.


Address lines are not the same as data lines. Your original post mentioned “math”, which, to me, implied data, not addresses. Processors absolutely can do 64 bit math (integer and floating point; some can even do multiple at a time), but you are correct that 64 physical address line processors don’t exist…yet. It is entirely possible we’ll see servers with that many sometime in the next few decades.


You are correct, I should have been more clear. Point is that 64 bits is not always as it seems, and few people seem to know about the address line limitations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: