I've found the weekly Sentinel minutes to be very high-quality reporting about world events
@Loppukilpailija
$0 in pending offers
Loppukilpailija
about 1 month ago
I've found the weekly Sentinel minutes to be very high-quality reporting about world events
Loppukilpailija
6 months ago
"you see more value and practicality in the first steps of this decomposition (taking one big N-level model and decomposing it into n (N-1)-level models) rather than the last steps [...]?"
Yes, I'd see a lot of value in being able to do the first steps of decomposition. I'm particularly thinking about concerns stemming from the AI itself being dangerous, as opposed to systematic risks. Here I think that "a system built of n (N-1)-level models" would likely be much safer than "one N-level model" for reasonable values of n. (E.g. I think this would plausibly be much better in terms of hidden cognition, AI control, deceptive alignment, and staying within assigned boundaries.)
"I would expect the largest performance hit to occur primarily in the initial decomposition steps, and for decomposition to hold on until the end."
I would expect this, too. This is a big factor for why I think one should look here: it doesn't really help if one can solve the (relatively easy) problem of constructing plain-coded white-petal-detectors, if one can't decompose the big dangerous systems into smaller systems. But if one the other hand one could get comparable performance from a bunch of small models, or even just one (N-0.1)-level model and a lot of specialized models, then that would be really valuable.
"but are currently worried about both expanding capabilities and safety concerns"
Makes sense. "We are able to get comparable performance by using small models" has the pro of "we can use small models", but the con of "we can get better performance by such-and-such assembles". I do think this is something one has to seriously think about.
Loppukilpailija
6 months ago
CAIS has done great work in the past; I'm showing appreciation with a small donation, and hope that larger donors will provide more funding
Loppukilpailija
6 months ago
I do think that the textbook improves upon BlueDot Impact's material, and more broadly I think the "small things" such as having one self-contained exposition (as opposed to a collection of loosely-connected articles) are actually quite important.
I second Ryan Kidd's recommendation of consulting people with experience in AI safety strategy and AI governance. I think getting in-depth feedback from many experts could considerably improve the quality of the material, and suggest allocating time for properly collecting and integrating such feedback.
My understanding is that there are a lot of AI safety courses that use materials from the same reference class, and would use improved materials in the future. Best of skill on executing the project!
Loppukilpailija
6 months ago
I overall find the direction of safer-by-design constructions of AI systems an exciting direction: the ideas of constructability are quite orthogonal to other approaches, and marginal progress there could turn out to be broadly useful.
That said, I do think this direction is littered by skulls, and consider the modal outcome to be failure. I think that especially the fully-plain-coded approaches are not practical for the types of AI we are most worried about, and working on this would very likely be a dead end. I'm more excited about top-down approaches: trying to make models more modular-by-design while essentially retaining performance, in the sense of "we have replaced one big model with N not-so-big models".
The project authors seem to be aware of the skulls, and indeed the proposal has some novel components that may get around some issues. While I think it's still easy to run into dead ends, this is good enough of a reason for me to fund the project.
Overall, I think simply having a better understanding of the constraints involved when trying to make systems safer-by-design is great. I'd quite like there to be people thinking about this, and would be happy about progress on mapping out dead ends and not-so-dead ends.
Loppukilpailija
6 months ago
This translation project seems worthwhile to me; I gave a small retrospective donation as a way of saying thank you.
Tangential, but I'm curious about how you proceeded with the translation. Are current machine translators (such as DeepL) good enough to be useful as a first draft, or did you do it completely via human work?
Loppukilpailija
10 months ago
I created an application for this. You can see a screenshot below. Supports essential features - e.g. allows logging predictions.
Thanks to Isaac King for allowing me to build this project on top of his code-base, it allowed me to get started. He deserves a portion of the Shapley value.
You can get the code here. (It might require a bit of fiddling around to make it work due to some files being in .gitignore, though.)
The bad news: I have not hosted it anywhere. (I lack the knowledge and, currently, the spoons. I'd very much appreciate it if someone else would do that, but I don't expect that to happen.) Thought I'd still share the update.
Loppukilpailija
11 months ago
Update:
I have a demo with basic functionalities up.
I realize that I asked for way too much funding - this was much easier to create than I expected. (Not sure what to do with the post now.)
Will give a more substantive update a bit later.
For | Date | Type | Amount |
---|---|---|---|
Finishing The SB-1047 Documentary In 6 Weeks | 19 days ago | project donation | 5000 |
Fund Sentinel for Q1-2025 | about 1 month ago | project donation | 500 |
Fund Sentinel for Q1-2025 | about 1 month ago | project donation | 1000 |
Research Staff for AI Safety Research Projects | 5 months ago | project donation | 500 |
<51822c19-d998-453a-9896-bf55d53e1642> | 6 months ago | tip | +1 |
<51822c19-d998-453a-9896-bf55d53e1642> | 6 months ago | tip | 1 |
<28ef2a62-3b35-47da-beb8-1d7acce2095d> | 6 months ago | tip | 1 |
<51822c19-d998-453a-9896-bf55d53e1642> | 6 months ago | tip | 1 |
AI Safety Textbook | 6 months ago | project donation | 2000 |
<5b5e53f5-c48c-4c35-a492-c07c6c34fb12> | 6 months ago | tip | 1 |
Translation of BlueDot Impact's AI alignment curriculum into Portuguese | 6 months ago | project donation | 300 |
Manifund Bank | 6 months ago | deposit | +10000 |
Manifund Bank | 7 months ago | mana deposit | +110 |