More

nsypteras · 2026-01-15T01:47:34 1768441654

nsypteras · 2025-12-15T15:39:35 1765813175

Analyzing frontier LLM performance on my favorite daily puzzle game (https://www.nicksypteras.com/blog/cbs-benchmark.html) Next step is to assess how well the LLMs can create their own new, logically satisfiable puzzles in the same style. Then I'll have them battle it out, with one model creating a puzzle and the other attempting to solve it!

slig · 2025-12-15T20:32:05 1765830725

Thanks for sharing! I want to have some sort of agentic "helper" to my new puzzles website [1], and I've learned some tips from your post/code, thank you!

Have you given any thought about how to create the puzzles? Do you think it'd possible to create them using LLMs?

[1]: https://www.puzzleship.com

nsypteras · 2025-11-21T16:25:34 1763742334

Congrats on launching! One immediate thought is that people will always be wary of running LLM-generated code on their machines even if it's sandboxed. Is one of the future business cases for this to host a remote execution environment that pctx can call out to rather than running the code locally?

CuriouslyC · 2025-11-21T17:15:06 1763745306

I don't see a reason to be nervous about running AI on a local system if it's VM encapsulated with cgroups.

pmkelly4444 · 2025-11-21T16:34:42 1763742882

yes! coming soon

nsypteras · 2025-11-14T17:35:02 1763141702

Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.

[1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094

nsypteras · 2025-11-06T21:11:05 1762463465

I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for

nsypteras · 2025-07-24T15:24:36 1753370676

"Counter Chinese Influence in International Governance Bodies" and grouping them in with US "adversaries" and "rivals" is quite undiplomatic language to throw in under "Lead in International AI Diplomacy and Security" section. Diplomacy with China should be an important part of this initiative but will inevitably be bungled.

adestefan · 2025-07-24T15:38:41 1753371521

The language lets you get around a bunch of pesky laws by declaring it a "national defense emergency."

mkolodny · 2025-07-24T16:18:47 1753373927

Even if it’s not perfect, I’m happy to see there’s a focus on AI Security. NIST has been a reliable producer of quality international standards for cybersecurity. Hopefully this action plan will lead to similarly high quality recommendations for AI Security.

shortrounddev2 · 2025-07-24T15:27:21 1753370841

China is an adversary of the West, and leading in international security means posing a challenge (or, in an ideal world, a better alternative) to Chinese influence on the international stage.

mensetmanusman · 2025-07-24T16:16:44 1753373804

It’s necessary to put pressure on trying to prevent a Taiwan invasion.

nsypteras · 2025-07-22T15:29:48 1753198188

1984: U.S. withdraws. 2003: U.S. rejoins. 2011: U.S. stops paying dues after Palestine joins. 2017: U.S. announces withdrawal (effective end of 2018). 2023: U.S. rejoins, pledges to repay dues. 2025: U.S announces withdrawal

Seems to be a revolving door

rjzzleep · 2025-07-22T16:07:59 1753200479

They're getting ready to bomb Iran's UNESCO sites. They did bomb several UNESCO sites in Yugoslavia and other places while they left. Their boy Grossi also told the whole world that there is a big target on a UNESCO site a short while back.

selimthegrim · 2025-07-22T16:26:42 1753201602

Which site in Yugoslavia did they bomb?

dmix · 2025-07-22T17:30:58 1753205458

NATO bombings damaged a Kosovo (post Yugoslavia) church in 1999 that was later added to UNESCO in 2006

https://en.wikipedia.org/wiki/Gra%C4%8Danica_Monastery

scantron4 · 2025-07-23T15:19:00 1753283940

So its a time traveling crime?

whynotmaybe · 2025-07-22T17:36:09 1753205769

History mismatch/Mandela effect? Some of the bombed sites were already known as culturally significant but not recognized by unesco yet, like Novi Sad that became a unesco creative city in 2023.

dmix · 2025-07-22T17:56:19 1753206979

UNESCO Creative cities are very different from UNESCO world heritage sites.

rs186 · 2025-07-22T15:52:40 1753199560

Makes me wonder if officials at UNESCO even cares about the decision. "Oh that again?" Probably already used to this.

rgblambda · 2025-07-22T17:31:26 1753205486

Similar to the Israeli ambassador being recalled from Dublin. They mean it as a big dramatic statement but they've done it that many times it's lost all significance.

She only gets reinstated again for the purpose of making another dramatic exit.

lawlessone · 2025-07-22T17:50:26 1753206626

They always send their most incompetent ambassadors to Dublin, ones that put their foot in their own mouth.

rgblambda · 2025-07-22T18:25:47 1753208747

I suppose looking at it from the Israeli government's perspective, Ireland is a very safe place for Israelis and Jewish people in general, but the public and government are vocal on Israel's actions and there's no defence/intelligence links between the two countries. Trade links are on the European level.

There'll never be a reason for them to send a skilled diplomat, so may as well send a shit stirrer who's only good for causing controversy.

lawlessone · 2025-07-22T19:17:59 1753211879

when you put that way its pretty logical.

SllX · 2025-07-22T20:04:53 1753214693

They’re never happy about the loss of money. For UN institutions, the US usually contributes a theoretical cap of about 22% but in real terms I think it’s more like a quarter of their annual budget or a little over in some cases. When we’re not paying, that’s a lot of money that UNESCO isn’t getting.

overfeed · 2025-07-22T22:31:57 1753223517

Predictably, if/when China becomes the premier funder of UN organizations, there will be a lot of grousing about it by US politicians. The amount of soft-power being trashed is astounding

SllX · 2025-07-22T22:47:13 1753224433

We’re the ones seeking to cap our contributions. The formula currently doesn’t allow for any one country to pay more than 22% with America the only one actually paying that much, save for the institutions we’ve cut off. For UN peacekeeping we’re actually assessed at 27% but Congress capped that to 25% back in 1993.

https://betterworldcampaign.org/us-funding-for-the-un/un-bud...

If any other country wants to step in and fill the gap, I don’t think Congress will care.

overfeed · 2025-08-03T08:44:37 1754210677

> If any other country wants to step in and fill the gap, I don’t think Congress will care

"Countering the PRC Malign Influence Fund Authorization Act of 2023[1]" says otherwise.

1. https://www.congress.gov/bill/118th-congress/house-bill/1157...

SllX · 2025-08-04T08:29:49 1754296189

All of our foreign policy prior to January 20th 2025 is in a state of flux. Officially, Congress cares, but the first 7 or so months of this year have been enlightening in a strange way, and with our President taking the lead, there is a strong possibility that Congress will not care if the possibility of the PRC paying more comes up in any policy discussions.

ashoeafoot · 2025-07-22T23:04:34 1753225474

Eh china finances a ton of members, who better vote in line as debtors should

bad_haircut72 · 2025-07-22T15:50:07 1753199407

If you abandon it completely something else might rise up - but funding/participating only up to a point, it works to suppress it - see Ukraine aid policies aswell

Tostino · 2025-07-22T17:54:23 1753206863

Look at the years, and see how they match up with the administration in power...

yencabulator · 2025-07-23T21:07:09 1753304829

  1984 withdraw Reagan
  2003 rejoin   Bush
  2011 protest  Obama (forced by law)
  2017 withdraw Trump
  2023 rejoin   Biden
  2025 withdraw Trump

Kinda tracks, except for the Bush one.

paulddraper · 2025-07-22T18:03:37 1753207417

Tbf, if you remove the Biden 2023 pledge, the rest makes sense:

In the two decades between 1984 and 2003, UNESCO implemented a number of reforms in management+transparency+politicization, and the U.S. returned.

Then Palestine was admitted, and the U.S. left.

DSingularity · 2025-07-22T15:47:44 1753199264

Cycle of politician appeasing their genocidal masters until the government start to realize what that means exactly at which point we pull back to humanity.

nsypteras · 2025-07-20T14:47:55 1753022875

Same here! Kiwix comes in clutch on flights. I've used it so many times to get background knowledge on topics mid-read. Plus free and open source. Such a great service.

anupulu · 2025-07-21T21:42:31 1753134151

Yes! I’ve used it on flights and long train rides (and generally when travelling) when the network connection might be a bit patchy.

nsypteras · 2025-07-20T14:35:52 1753022152

I think that would be one of the success cases described in the article because HITL is an integral part of good customer support chatbots. Support chats can be escalated to a human whenever the agent is unable to provide a satisfactory answer to the user.

nsypteras · 2025-07-08T14:50:51 1751986251

> The transportation agency has spent years looking for an innovative way to allow passengers to move faster through the security checkpoints.

I think the writer had some fun with this one