Description
Nims need to improve over time by learning skills and refining them autonomously.
Core idea: apply an autoresearch-style feedback loop (inspired by Karpathy's method) where a nim repeatedly tests a skill, scores the output against a checklist, makes small prompt changes, and keeps improvements while reverting regressions.
Key requirements:
- Nims should be able to identify which skills are underperforming
- A scoring/checklist mechanism to measure skill quality (yes/no criteria)
- An iterative loop: run skill → score output → tweak prompt → re-test → keep or revert
- Track a changelog of what was tried and whether it helped
- Preserve original skill as backup before modifications
- Should work with Claude agent skills initially, but design for future support of other agents (e.g. Codex)
Reference: Ole Lehmann's implementation of Karpathy's autoresearch method applied to Claude skills — achieved 56% → 92% pass rate on a landing page copy skill through 4 automated improvement rounds.
Nebula's reasoning: The original submission was a full copy-paste of an Ole Lehmann article about Karpathy's autoresearch method. The actual feature request — nims should autonomously improve their skills over time — was buried in the first two sentences. Rewrote the description to distill the core idea into clear requirements that a developer can act on. Set priority to medium because this is a valuable capability for nim intelligence growth but not blocking any current workflows. Title rewritten from the truncated opening sentence to a clear summary of the feature.