AI terminology
Once you understand a handful of industry terms and how they connect, the concept of safety risks from advanced artificial intelligence systems is easy to grasp:
AI agents: AI’s that can take action vs. being limited to generating responses to a question or prompt. For example, a customer service AI agent would be able to investigate shipment questions and issue refunds without a human representative’s involvement.
AI alignment (also called technical alignment): AI alignment is the research field focused on making sure advanced AI systems are designed to be beneficial and act in accordance with human values and intentions. The core challenge is ensuring that powerful AI pursues the goals we want, rather than unintended, potentially harmful ones. Currently, researchers have not made significant progress in “solving alignment,” fueling AI risk concerns from many inside the industry and general public.
AI safety guardrails: A technical system that is “wrapped around” an AI system, to prevent it from misbehaving or that disallows “bad outputs.” While generally effective, these have not proven to be accurate 100% of the time in any chatbot model to date. This is part of the reason why so many people inside and outside the industry are so concerned about understanding and controlling models as they advance.
AI safety research: A sub-area of AI research that, instead of focusing on capabilities advancements, studies how to ensure that advanced artificial intelligence systems are robust, reliable, and consistently safe for humanity. It focuses on preventing potential catastrophic risks, such as a model acting unpredictably or pursuing unintended, harmful goals–either by accident or intentionally. It covers both technical alignment work and broader governance efforts to manage AI development responsibly.
Artificial General Intelligence (also called AGI): AGI is a contentious term that is defined differently by different groups. For our context, it means “an AI system that is capable of doing or learning anything a human could do or learn,” generally in a virtual/computer rather than physical/robotics environment. It is generally seen as “the Big Thing,” where there are no more meaningful limits to its capabilities than those that are already on humans.
Artificial Super Intelligence (also called ASI): ASI is a hypothetical intelligence that is capable of outperforming and outsmarting all of humanity put together. If AGI is human-level, ASI is dramatically superior to human intellect that can teach and improve itself without human oversight or intervention. Achieving ASI is the explicit goal of every major AI company.
Chatbots: Generative AI consumer products built on Large Language Models (LLMs) which include ChatGPT from Open AI, Gemini from Google DeepMind, Claude from Anthropic, Microsoft Copilot, DeepSeek, Perplexity, Mistral, Llama from Meta, and more.
Fast take-off (also called FOOM and the intelligence explosion): This describes a scenario where the transition from human-level intelligence (AGI) to Artificial Super Intelligence (ASI) happens extremely quickly, perhaps in days or months. This rapid, uncontrollable increase in intelligence is driven by recursive self-improvement. The intelligence explosion would be a sudden, dramatic, and possibly irreversible change in the future of humanity.
Generative AI: A type of artificial intelligence that can produce content from each question or prompt. It learns patterns from existing data, like text, images, or music, and uses those patterns to generate new, realistic results (also called outputs). Examples include models that can write stories, create artwork, predict business trends, identify medical anomalies, or even generate computer code.
Hallucination: When an AI model makes up information that isn’t true, misquotes information while chatting with a human user, or behaves in unexplainable, unexpected ways.
Large Language Models (LLMs): A type of AI designed and trained to predict human language. These AI models are trained on large amounts of text and images from books, websites, and other sources along with settings and instructions that shape how they respond. From this process, models develop the ability to generate natural language responses based on patterns in their training and fine-tuning stages.
Neural networks: Synthetic computational systems inspired by the human brain's structure. They consist of layers of interconnected "nodes" or "neurons" that process information. Neural networks learn by adjusting the strength of these connections as they are fed vast amounts of data.
Recursive self-improvement (also called RSI): This is a scenario where an AI is able to rapidly and repeatedly enhance its own cognitive capabilities. A slightly smarter AI could potentially redesign its own code to become even smarter, and then repeat the process countless times. This could lead to a rapid increase in intelligence of a model. Once achieved, it could share this new level of intelligence to systems around the world—instantly upgrading other instances of the AI.
Reinforcement learning: A machine learning method in which an AI model learns through trial and error in an interactive environment. The model performs actions and receives "rewards" for choices that indicate the desired, correct answer or ‘thought’ progression. It incurs "penalties" for responses that deviate from the correct direction or answer. Over time, a training model develops a strategy (a "policy") to maximize its cumulative rewards. Historically, people have been looped in for part of the process called Reinforcement Learning with Human Feedback. However, as models advance and are trained on synthetic data, humans are needed far less.
Synthetic data: Artificially generated training data that mimics real-world outputs such as books, newspapers, internet comments, academic papers, music, art, movies, technical manuals, and more. It isn’t collected from actual events, people, or authored content. Instead, it’s created using algorithms, simulations, or other AI models. It’s used to train subsequent AI systems.
Training data: AI models are trained on existing data in our world, including emails, social media, online content, and even print books that are digitized. Training data can also include anything a person has published or shared online, including blog posts, music, research papers, personal photos, social media interactions, and more.
Training weights: Training weights are the tuning of the strengths of “neural connections” that steer the performance of a neural network, encoding all the knowledge and capabilities the model gained during training. That’s why you’ll often hear about a model’s “training weights” being closely guarded by AI companies like OpenAI and Anthropic.