6 minute read

AI Ethics; AI Safety, Key Issues, AGI (Artificial General Intelligence), and Current AI Models’ Challenges



AI has evolved significantly recently, transforming various industries by enhancing efficiency and creating new opportunities. In this context, it is crucial to understand AI Ethics, which is essential for guiding the responsible development and use of artificial intelligence technologies. By prioritizing these ethical considerations, we can ensure that AI benefits society and mitigates potential risks associated with its adoption.

The summary of this posting is provided below for your convenience and understanding!

Model & System Cards:

  • Model Cards: Provide details on AI models, including intended use, performance metrics, evaluation data, and ethical considerations.
  • System Cards: Explain the operation of combined AI systems, acknowledging that these systems can evolve.

AI Safety, Ethics, and Machine Morality:

  • AI Safety: Focuses on preventing accidents or misuse of AI systems, including existential risks.
  • AI Ethics: Centers on responsible AI use, with attention to current issues like bias.
  • Machine Morality: Investigates the moral behavior of AI systems.

Critical Issues in AI Ethics & Safety:

  • Bias Mitigation: Tackling inherent biases in AI training data.
  • Model Monitoring: Includes uncertainty estimation and detecting out-of-distribution (OOD) inputs or malicious use.
  • Model Robustness: Ensures resilience to adversarial or degraded inputs, distribution shifts, and dataset bias.
  • AI Alignment: Aims to align AI with intended ethical guidelines to prevent misuse, including research in reinforcement learning with human feedback (RLHF).
  • Weaponization of AI: Discusses AI applications in autonomous weapons and cybersecurity.

Social and Economic Impact:

  • Social Manipulation: Concerns over AI-generated media and misinformation.
  • Job Automation: Predicted impacts on employment and potential solutions like universal basic income (UBI).
  • AI in Media: Effects of generative AI on creative industries and related ethical concerns.

AGI (Artificial General Intelligence):

  • Definition: AGI would have human-level intellectual capabilities and is differentiated from today’s “narrow AI.”
  • Existential Risks: Discuss potential global threats AGI poses if not correctly aligned, including the “paperclip maximizer” scenario from Nick Bostrom’s thought experiment.

Challenges for Current AI Models:

  • Limitations like prompt sensitivity, planning deficiencies, lack of long-term memory, and issues with transparency and interpretability in large language models (e.g., GPT-4).


Model & System Cards

AI Model Cards

Image Source: Google Model Cards



  • Model details: license, training methods
  • Intended Use– It may be specified as an out-of-scope use case.
  • Metrics: Performance evaluation information
  • Training Data: Often omitted
  • Quantitative Analyses: Results on evaluation data
  • Ethical considerations
  • Caveats and recommendations

AI System Cards

Image Source: Meta Blog - System Cards



  • It explains how AI and potentially non-AI models work together to accomplish a task.
  • System cards aren’t definitive since AI systems can evolve.
    – Model cards assume the system is frozen in time.

Red Teaming

  • Companies often employ “red teams” to assess the limitations and risks of an AI system.
  • Red teams may be internal, external, or both.
  • For example, GPT-4V and GPT-4 used externa red teams.

GPT-4V System Cards

  • Explains how the system was evaluated and its limitations.
  • But it omits what it was trained on and training details.

Image Source: GPt-4V System Card from Open AI





AI Ethics

  • How can we determine if these systems are fair and appropriate?
  • Definition

    • AI Safety: An interdisciplinary field that prevents accidents, misuse, or harmful consequences due to AI systems. The community is very concerned about the existential risk of future AI.
    • AI Ethics: Moral principles and techniques intended to inform AI's development and responsible use. Much of the focus is on bias in AI and the accountable usage of near-term systems.
    • Machine Morality: Research concerning designing AIs that behave morally
  • However, there is a lot of overlap between the three definitions. This is due to different communities with different focuses wanting their phrase for what they do. In this lecture, the concepts were used together.

Model Monitoring

  • Estimating Uncertainty
    – Models are often overconfident and poorly calibrated.
  • Anomaly / Out-of-Distribution (OOD) Detection
    – Determining when the model receives an input it cannot handle
  • Detecting malicious use of an AI system

Model Robustness

  • Concerned with measuring how robust models are to different inputs.

    • Adversarial inputs
    • Degraded inputs
    • Distribution shift
  • Adversarial Robustness

    • Adding small amounts of targeted noise to neural network inputs can greatly shift their outputs.
      • Whitebox attacks: Can differentiate through network to create modified inputs.
      • Blackbox attacks: Can’t differentiate through network.
    • Lots of security researchers are working in this space to develop new attacks and defenses.
  • Distribution Shift

    • Measuring the performance of a system on test sets that differ in distribution from the training data.
    • In the real-world, users will use a system on data that may differ from the training data.
    • Closely related to issues with dataset bias.
    • Found that current techniques in supervised learning provide no robustness to natural distribution shifts.
    • Training on more data helps, but there are diminishing returns.
  • Long-Tailed Datasets

    • The real world is long-tailed.
    • Low-shot learning in practice is kind of missing the point.
    • We often have a lot of data for some tasks and little data for others.
  • Dataset Bias and Covariates

    • Deep learning struggles to learn the correct abstractions and behavior when there is dataset bias.
    • Dataset bias is very difficult to track down.
    • Enumerating all covariates to stratify performance against and to test robustness is hard.

Social Manipulation

  • AI generated text, images, videos, and audio will make it hard to figure out the truth.
  • Already social media platforms are being inundated by fake accounts and AI-powered bots to spread misinformation, amplify certain voices, or create the illusion of widespread support for an idea.

AI Alignment

  • Alignment research is aimed at methods to steer systems toward the creator’s intended goals, preferences, and ethical principles
  • Including research in reinforcement learning with human feedback (RLHF).


Artificial General Intelligence (AGI)

Artificial General Intelligence (or Strong AI) is an AI that can understand or learn any intellectual task that a human can.

  • Enabling us to have agents that can do many tasks for us.
    – Robots
    – Personal assistants
    – Scientific discovery

  • What is Intelligence?
    • A very general mental capability that, among other things, involves: – Ability to reason
      – Plan
      – Solve problems
      – Think abstractly
      – Comprehend complex ideas
      – Learn quickly and learn from experience.
    • LLMs can’t learn from experience outside of the context length.
  • Problem They Identify
    • GPT models cannot have an inner dialogue. They often have sufficient information to answer a question, but only if prompted “correctly.”
    • The model has to come up with the answer in essentially a single pass of a feedforward architecture, and it can’t implement a for- loop to determine how many steps to run itself.
    • They hypothesize that this could be addressed with some mechanism to have an “inner dialogue.
      – It does not have a working memory beyond concatenating the next token to its subsequent input.
      – They argue that prompting it to “think step by step” is insufficient.

  • How Do So Many Capabilities Emerge from Autoregressive Training?
    • Hypothesis: Large amounts of diverse data force the network to learn generic and useful “neural circuits” while the size of the models provide enough redundancy and diversity for these circuits to specialize.

  • Existential Risk from AGI
    • The hypothesis is that AGI has some non-zero chance of leading to human extinction or an irreversible global catastrophe.
    • Humans dominate because the human brain has capabilities other animals lack, if AI surpasses us it would be difficult or impossible to control.



Leave a comment