TASM Notes, May 5th 2024
Sat May 11, 2024Pre-Meeting Chatting
- There's a documentary crew here this week. They'll be filming to put together a piece on the Toronto AI safety meet-up in particular.
- We give them the usual outsider pitch (including mild jokes about
p(doom)
and talking about the general attitudes of the usual attendees)
The News
- AlphaFold3 gets released.
- Safety-wise, this is not a design tool, so it doesn't touch off the usual firestorm about "oh no, AI can design poisons and bioweapons!"
- OpenAI's model spec
- Headlined "Assist the user. Benefit humanity. Reflect well on OpenAI"
- This makes explicit that they take the reputation of OpenAI into account when deciding whether a model should respond to certain queries
- They additionally makes explicit the idea that some information that might be considered bad, is going to be available contextually. For example: it might refuse to respond to "tell me how to shoplift", but might respond to "I'm a shopkeeper; tell me how people might shoplift so that I can prevent that"
- Longer form rules:
- Follow the chain of command (OpenAI dev -> dev -> user)
- Comply with applicable laws
- Don't provide infohazards
- Respect creators and their rights
- Protect peoples' privacy
- Don't respond with NSFW content
- Paper: SOPHON (non-fine-tunable learning)
- If you have an open model with released weights, currently, anyone can almost trivially strip out guard rails with fine-tunes
- SOPHON is research related to making this a lot harder
- It's been tested on small image models
- Caveats:
- Need to know which behaviors to restrict
- Not clear if it scales
- Jailbreaks incoming :p
- People in the open source movement might be hostile to this sort of research
- Conceptually, it's about trapping the model in a local minimum attractive enough that it stays there, despite additional training/fine-tuning
- AI Governance news
- Anthropic, OpenAI and Meta are failing in a commitment to the UK government to share their models with the UK's AI Safety Institute before deployment
- Anthropic: "nice idea but difficult to implement"
- Research access typically just lets you go at a less filtered model, rather than direct weights (no one in the room knows the detail of what the UK is requesting disclosure on)
- Anthropic comes out against heavy-touch AI regulation
- Claude "doesn't work" (without VPN :p) in Canada, and we don't even have AI regulation yet. Apparently, they've also restricted their API from Europe for regulatory reasons? I don't know the details here.
- Compute requirements for training a frontier model are already beyond any small players, so most people in the room are dubious on this reasoning. Apparently, there was a multi-university effort to timeshare a cluster large enough to do this research, but it's not common. Universities are mostly priced out of doing this research, and the top safety work is being done exclusively inside of Meta/OpenAI/Anthropic. Shrug.
- New law in the pipeline in California (SB-1047)
- Anthropic, OpenAI and Meta are failing in a commitment to the UK government to share their models with the UK's AI Safety Institute before deployment
- AI Priest says it's ok to baptize babies in Gatorade? (Brawndo! It's got Electrolytes!). Not sure what the context is or what the Catholic Church thinks about this.
- A company called AdVon apparently producing AI slop for media publications. "All five of the microwave reviews include an FAQ entry saying it's ok to put aluminum foil in your prospective new purchase."
- There is a database of AI incidents online
- xLSTM exists; a possible competing approach to the transformer architecture
The Talk - The EU AI Act by Kathrin Gardhouse
We're discussing this today.
- How much does the EU AI Act help with AI Safety issues?
- Kathrin is a German trained lawyer, licensed to work in Canada, Policy Lead at AIGS. She does a lot of work in the AI regulation space
EU Act Scope
- First comprehensive law regulating AI worldwide (that is, this is the first in the world, it doesn't have global scope)
- Will enter into force 20 days after publication in the Official Journal (expected in May or June 2024)
- Classifies AI systems by level of risk and mandates development/deployment/use requirements depending on the risk classification
- Introduces heightened technical and documentary requirements for high-risk AI systems
EU AI Act and Existential Risks
"GPAI" means "General Purpose Artificial Intelligence" systems. It's a technical term defined in the act text. Includes things like ChatGPT/Claude and other general chat models. It doesn't include special-purpose models like Tortoise or StableDiffusion.
- The act touches on
- Systemic risks
- weapon development
- self replicating systems
- systems that interface with critical infrastructure
- negative effects that might affect the entire human race
- Territorial Scope
- Anyone who places AI systems or general AI models on the EU market - whether entity is established in EU or not
- AI lifecycle
- Upon placing the AI system on the market (open source models are excluded from some provisions but not GPAIs relating to systemic risks)
- Security exclusion
- National security uses are explicitly excluded (leaving up to member states rather than regulating)Problem: high-risk capabilities emerging during development aren't covered by the main act. Partial Solution: There is an exception here; GPAI providers have reporting obligations about risky capabilities that emerge during development.
Prohibited AI systems
- Subliminal techniques (article 5.1a)
- Exploiting vulnerabilities (article 5.1b)
- Social scoring (article 5.1c)
- Criminal offence prediction (article 5.1c)
- Facial image scraping for recognition system (article 5.1d)
- Emotion inference (article 5.1e)
- Biometric categorization (article 5.1f)
- "Based on biometric data to deduce or infer race, political opinions, trade union membership, religious or philosophical beliefs, sex life or sexual orientation"
- Real-time remote biometric identification (article 5.1g)
- "In publicly accessible spaces for the purposes of law enforcement unless and in as far as such use is strictly necessary"
These are things you Do Not Do. The way the act is structured, you don't get retroactive penalties, but you are incentivized to comply going forward if you want to be able to do business in the EU.
Q: Where is the liability in a situation where there are safeguards around these uses, but they're easily jail-breakable?
A: The act doesn't talk about liability in most of these situation, but there is language about understanding possible uses and misuses of your tech. There's a standard here where you consult industry professionals and see if they agree that the given situation falls into "forseeable misuse" of your product
High Risk AI Systems
- Article 6 lists out a bunch of systems that count as "high risk". This includes things like safety systems in automotives/aviation/watercraft/lifts/etc.
- Also, anything carved out in "Annex 3". This includes
- Biometrics
- Critical infrastructure
- Educational and vocational training
- Employment workers management and access to self-employment
- Access to and enjoyment of essential private services and essential public services and benefits
- Law enforcement
- Migration, asylum and border control management
- Administration of justice and democratic processes
Q: If I'm using ChatGPT to ask me questions in an attempt to teach me things, does it count as an "Educational and vocational training" system for the purposes of this act?
A: The act doesn't really prohibit individual use of these technologies. It's more concerned with systems that gate access to education/vocational training.
General Purpose AI Systems
- A GPAI shall be presumed to have high impact capabilities when the cumulative amount of compute used for its training measured in floating point operations (FLOPs) is greater than 1025
Who Does The Classification?
- Providers self-classify systems as high risk
- 18 months after coming into force, classification guidelines are due
- If an AI system is classified as "not high risk" by the provider, a market surveillance authority can evaluate it as high-risk
- Main classification choices must be included in the Technical Documentation used during Conformity Assessment, which may or may not be a self-assessment
- Annex 3 high-risk systems must be registered, and Annex 3 non-high-risk AI systems by way of derogation must also be registered
- As soon as providers discover that GPAI models entail systemic risks, they have a notification obligation
- Commission may designate GPAIs as systemic risk GPAIs
- If systemic risk classification is presumed, providers can request re-designation as an exception
- List of systemic risk GPAIs must be published by the commission and be kept up-to-date
Q: Are there white-hat hackers or equivalents in this space?
A: Kind of? The equivalents here are evals, either alignment or capability. The idea being trying to elicit behavior from models in order to classify them.
Compliance Obligations - High-Risk AI Systems
For Providers
- No conformity assessment
- Risk management system, including incident reporting
- Data governance (no training on PHI)
- Technical documentation
- Automatic logging and record keeping
- Transparency and provision of information to deployers
- Human oversight
- Accuracy, robustness and cybersecurity
- Quality Management System, including post-market monitoring
For Deployers
- Fundamental rights impact assessment before deploying & suspending system usage in case of national-level risks
- Ensure that input data is pertinent to the system's intended use & adhering to GDPR obligation for DPIAs
- Verifying compliance with the AI act and ensuring all relevant documentation is available
- Retaining automatically generated system logs
- Informing individuals about the potential use of high-risk AI
- Implementing human oversight by trained individuals
- Reporting serious incidents to the AI system provider
- Verifying compliance with the AI Act and ensuring all relevant documentation is available
One comment that's come up a few times relates to implied shortcomings of self report. Apparently nuclear does this? Like, if you're taking the uranium-to-compute metaphor seriously, apparently if you ask nuclear weapons regulators "How do you know who's working on nuclear weapons?" their response is "Oh, they tell us."
Another point here is that supply-chain monitoring is potentially effective (but not included in the Act itself). In the nuclear side of the metaphor, this implies monitoring uranium mines or refineries. In the compute side, it implies looking at Nvidia sales (Note: 4090s still well above $2k on Amazon)
Compliance Obligations - GPAIs
For Providers
- No conformity assessment
Still required:
- Technical documentation
- Transparency and provision of information to deployers
Only implicitly required:
- Risk management system
- Data governance
- Automatic logging
- Human oversight
- Accuracy, robustness and cybersecurity
PLUS:
- enable users to know of interaction with AI system
- Watermarking
- Compliance with copyright law
- Publicly available summary of training data
For Deployers
- Inform individuals about deepfake image, video, audio content & text that was automatically generated if it is published with the purpose of informing the public on matters of public interest.
That is all.
Compliance Obligations - GPAIs with Systemic Risks
For Providers
- Everything to do with GPAIs,
PLUS:
- Model evaluation (capabilities evals, I'm guessing?)
- Assess and mitigate systemic risks at Union level, including their sources, that may stem from the development, the placing on the market, or the use of GPAI with systemic risk
- Keep track of, document and report serious incidents and possible corrective measures to address them
- Adequate cybersecurity protection for the GPAI with systemic risk and the physical infrastructure of the model
For Deployers
- No further requirements; the same as regular GPAI deployers
Standards
- Technical standards are supposed to concretize risk and postures
- If you comply to these standards, you are deemed to be compliant with the act
- Codes of Practice apply to GPAI only
- AI Codes of Conduct are voluntary for AI systems other than high-risk
AI Regulatory Sandboxes
- To be set up by EU member states
- Purposes
- Improve legal certainty by providing legal advice and toolchains to achieve compliance
Enforcement
Commission can request documentation from GPAI providers, and issue fines or force developers to withdraw their model
- Prohibited use violation: up to 35 million EUR/7% worldwide annual turnover, whichever is higher
- Certain other violations up to 15 million EUR/3% turnover
For GPAI providers:
- Up to 15 million EUR or 3% of turnover if:
- failed to comply with request for a document/information
- Failed to comply with a corrective measure
- Failed to make available to the Commission access to the general-purpose AI model or general-purpose AI model with systemic risk with a view to conducting an evaluation