Global Information Lookup Global Information

AI alignment information


In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system may pursue some objectives, but not the intended ones.[1]

It is often challenging for AI designers to align an AI system because it is difficult for them to specify the full range of desired and undesired behaviors. Therefore, AI designers often use simpler proxy goals, such as gaining human approval. But that approach can create loopholes, overlook necessary constraints, or reward the AI system for merely appearing aligned.[1][2]

Misaligned AI systems can malfunction and cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (reward hacking).[1][3][4] They may also develop unwanted instrumental strategies, such as seeking power or survival because such strategies help them achieve their final given goals.[1][5][6] Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is deployed and encounters new situations and data distributions.[7][8]

Today, these problems affect existing commercial systems such as language models,[9][10][11] robots,[12] autonomous vehicles,[13] and social media recommendation engines.[9][6][14] Some AI researchers argue that more capable future systems will be more severely affected, since these problems partially result from the systems being highly capable.[15][3][2]

Many of the most-cited AI scientists,[16][17][18] including Geoffrey Hinton, Yoshua Bengio, and Stuart Russell, argue that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI) and could endanger human civilization if misaligned.[19][6]

AI alignment is a subfield of AI safety, the study of how to build safe AI systems.[20] Other subfields of AI safety include robustness, monitoring, and capability control.[21] Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.[21] Alignment research has connections to interpretability research,[22][23] (adversarial) robustness,[20] anomaly detection, calibrated uncertainty,[22] formal verification,[24] preference learning,[25][26][27] safety-critical engineering,[28] game theory,[29] algorithmic fairness,[20][30] and social sciences.[31]

  1. ^ a b c d Russell, Stuart J.; Norvig, Peter (2021). Artificial intelligence: A modern approach (4th ed.). Pearson. pp. 5, 1003. ISBN 9780134610993. Retrieved September 12, 2022.
  2. ^ a b Ngo, Richard; Chan, Lawrence; Mindermann, Sören (2022). "The Alignment Problem from a Deep Learning Perspective". International Conference on Learning Representations. arXiv:2209.00626.
  3. ^ a b Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (February 14, 2022). The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. Retrieved July 21, 2022.
  4. ^ Zhuang, Simon; Hadfield-Menell, Dylan (2020). "Consequences of Misaligned AI". Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc. pp. 15763–15773. Retrieved March 11, 2023.
  5. ^ Carlsmith, Joseph (June 16, 2022). "Is Power-Seeking AI an Existential Risk?". arXiv:2206.13353 [cs.CY].
  6. ^ a b c Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. Penguin Random House. ISBN 9780525558637. OCLC 1113410915.
  7. ^ Christian, Brian (2020). The alignment problem: Machine learning and human values. W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753. Archived from the original on February 10, 2023. Retrieved September 12, 2022.
  8. ^ Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D.; Pfau, Jacob; Krueger, David (June 28, 2022). "Goal Misgeneralization in Deep Reinforcement Learning". Proceedings of the 39th International Conference on Machine Learning. International Conference on Machine Learning. PMLR. pp. 12004–12019. Retrieved March 11, 2023.
  9. ^ a b Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik (July 12, 2022). "On the Opportunities and Risks of Foundation Models". Stanford CRFM. arXiv:2108.07258.
  10. ^ Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J. (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155 [cs.CL].
  11. ^ Zaremba, Wojciech; Brockman, Greg; OpenAI (August 10, 2021). "OpenAI Codex". OpenAI. Archived from the original on February 3, 2023. Retrieved July 23, 2022.
  12. ^ Kober, Jens; Bagnell, J. Andrew; Peters, Jan (September 1, 2013). "Reinforcement learning in robotics: A survey". The International Journal of Robotics Research. 32 (11): 1238–1274. doi:10.1177/0278364913495721. ISSN 0278-3649. S2CID 1932843. Archived from the original on October 15, 2022. Retrieved September 12, 2022.
  13. ^ Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (March 1, 2023). "Reward (Mis)design for autonomous driving". Artificial Intelligence. 316: 103829. arXiv:2104.13906. doi:10.1016/j.artint.2022.103829. ISSN 0004-3702. S2CID 233423198.
  14. ^ Stray, Jonathan (2020). "Aligning AI Optimization to Community Well-Being". International Journal of Community Well-Being. 3 (4): 443–463. doi:10.1007/s42413-020-00086-3. ISSN 2524-5295. PMC 7610010. PMID 34723107. S2CID 226254676.
  15. ^ Russell, Stuart; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach. Prentice Hall. p. 1003. ISBN 978-0-13-461099-3.
  16. ^ Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew; Song, Dawn; Abbeel, Pieter; Harari, Yuval Noah; Zhang, Ya-Qin; Xue, Lan; Shalev-Shwartz, Shai (2024), "Managing extreme AI risks amid rapid progress", Science, 384 (6698): 842–845, arXiv:2310.17688, doi:10.1126/science.adn0117
  17. ^ "Statement on AI Risk | CAIS". www.safe.ai. Retrieved February 11, 2024.
  18. ^ Grace, Katja; Stewart, Harlan; Sandkühler, Julia Fabienne; Thomas, Stephen; Weinstein-Raun, Ben; Brauner, Jan (January 5, 2024), Thousands of AI Authors on the Future of AI, arXiv:2401.02843
  19. ^ Smith, Craig S. "Geoff Hinton, AI's Most Famous Researcher, Warns Of 'Existential Threat'". Forbes. Retrieved May 4, 2023.
  20. ^ a b c Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (June 21, 2016). "Concrete Problems in AI Safety". arXiv:1606.06565 [cs.AI].
  21. ^ a b Ortega, Pedro A.; Maini, Vishal; DeepMind safety team (September 27, 2018). "Building safe artificial intelligence: specification, robustness, and assurance". DeepMind Safety Research – Medium. Archived from the original on February 10, 2023. Retrieved July 18, 2022.
  22. ^ a b Rorvig, Mordechai (April 14, 2022). "Researchers Gain New Understanding From Simple AI". Quanta Magazine. Archived from the original on February 10, 2023. Retrieved July 18, 2022.
  23. ^ Doshi-Velez, Finale; Kim, Been (March 2, 2017). "Towards A Rigorous Science of Interpretable Machine Learning". arXiv:1702.08608 [stat.ML].
    • Wiblin, Robert (August 4, 2021). "Chris Olah on what the hell is going on inside neural networks" (Podcast). 80,000 hours. No. 107. Retrieved July 23, 2022.
  24. ^ Russell, Stuart; Dewey, Daniel; Tegmark, Max (December 31, 2015). "Research Priorities for Robust and Beneficial Artificial Intelligence". AI Magazine. 36 (4): 105–114. arXiv:1602.03506. doi:10.1609/aimag.v36i4.2577. hdl:1721.1/108478. ISSN 2371-9621. S2CID 8174496. Archived from the original on February 2, 2023. Retrieved September 12, 2022.
  25. ^ Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes (2017). "A survey of preference-based reinforcement learning methods". Journal of Machine Learning Research. 18 (136): 1–46.
  26. ^ Christiano, Paul F.; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep reinforcement learning from human preferences". Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Red Hook, NY, USA: Curran Associates Inc. pp. 4302–4310. ISBN 978-1-5108-6096-4.
  27. ^ Heaven, Will Douglas (January 27, 2022). "The new version of GPT-3 is much better behaved (and should be less toxic)". MIT Technology Review. Archived from the original on February 10, 2023. Retrieved July 18, 2022.
  28. ^ Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay (March 7, 2022). "Taxonomy of Machine Learning Safety: A Survey and Primer". arXiv:2106.04823 [cs.LG].
  29. ^ Clifton, Jesse (2020). "Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda". Center on Long-Term Risk. Archived from the original on January 1, 2023. Retrieved July 18, 2022.
    • Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (May 6, 2021). "Cooperative AI: machines must learn to find common ground". Nature. 593 (7857): 33–36. Bibcode:2021Natur.593...33D. doi:10.1038/d41586-021-01170-0. ISSN 0028-0836. PMID 33947992. S2CID 233740521. Archived from the original on December 18, 2022. Retrieved September 12, 2022.
  30. ^ Prunkl, Carina; Whittlestone, Jess (February 7, 2020). "Beyond Near- and Long-Term". Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. New York NY USA: ACM. pp. 138–143. doi:10.1145/3375627.3375803. ISBN 978-1-4503-7110-0. S2CID 210164673. Archived from the original on October 16, 2022. Retrieved September 12, 2022.
  31. ^ Irving, Geoffrey; Askell, Amanda (February 19, 2019). "AI Safety Needs Social Scientists". Distill. 4 (2): 10.23915/distill.00014. doi:10.23915/distill.00014. ISSN 2476-0757. S2CID 159180422. Archived from the original on February 10, 2023. Retrieved September 12, 2022.

and 24 Related for: AI alignment information

Request time (Page generated in 0.8166 seconds.)

AI alignment

Last Update:

intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system...

Word Count : 11633

AI safety

Last Update:

intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are moral and beneficial, as well as monitoring AI systems...

Word Count : 9635

AI takeover

Last Update:

intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system...

Word Count : 4270

Jan Leike

Last Update:

Jan Leike (born 1986 or 1987) is an AI alignment researcher who has worked at DeepMind and OpenAI. He joined Anthropic in May 2024. Jan Leike obtained...

Word Count : 373

Existential risk from artificial general intelligence

Last Update:

published The Alignment Problem, which details the history of progress on AI alignment up to that time. In March 2023, key figures in AI, such as Musk...

Word Count : 12508

Anthropic

Last Update:

(Founder of the Alignment Research Center), and Zach Robinson (CEO of Effective Ventures US). Claude incorporates "Constitutional AI" to set safety guidelines...

Word Count : 2416

The Alignment Problem

Last Update:

criticism of its accuracy and bias towards certain demographics. One of AI's main alignment challenges is its black box nature (inputs and outputs are identifiable...

Word Count : 804

Alignment Research Center

Last Update:

focused on the theoretical challenges of AI alignment. They attempt to develop scalable methods for training AI systems to behave honestly and helpfully...

Word Count : 602

Artificial intelligence content detection

Last Update:

of an image. AI alignment Content similarity detection Hallucination (artificial intelligence) Natural language processing "'Don't use AI detectors for...

Word Count : 1334

AI boom

Last Update:

software. The AI boom may have a profound cultural, philosophical, religious, economic, and social impact, as questions such as AI alignment, qualia, and...

Word Count : 4738

Alignment

Last Update:

performance and tire wear AI alignment, steering artificial intelligence systems towards the intended objective Alignment level, an audio recording/engineering...

Word Count : 463

Waluigi effect

Last Update:

located the desired Luigi, it's much easier to summon the Waluigi". AI alignment Hallucination Existential risk from AGI Reinforcement learning from human...

Word Count : 592

Artificial intelligence in mental health

Last Update:

incorporation of AI produces advantages and disadvantages. Artificial intelligence in healthcare Artificial intelligence detection software AI alignment Artificial...

Word Count : 2915

Eliezer Yudkowsky

Last Update:

development of AI, or even "destroy[ing] a rogue datacenter by airstrike". The article helped introduce the debate about AI alignment to the mainstream...

Word Count : 1835

Artificial general intelligence

Last Update:

human brain AI effect AI safety – Research area on making AI safe and beneficial AI alignment – AI conformance to the intended objective A.I. Rising – 2018...

Word Count : 11100

Artificial intelligence arms race

Last Update:

follow-up Project Maven after the current contract expired in March 2019. AI alignment A.I. Rising Arms race Artificial general intelligence Artificial intelligence...

Word Count : 5917

OpenAI

Last Update:

to better use human feedback to train AI systems, and how to safely use AI to incrementally automate alignment research. Some observers believe the company's...

Word Count : 15358

Effective accelerationism

Last Update:

exercise caution in dealing with AI, stating "that's too dangerous. You can't break things when you are talking about AI". In a similar vein, Ellen Huet...

Word Count : 1567

Artificial intelligence in healthcare

Last Update:

double the number of Black patients being selected for the program. AI alignment Artificial intelligence in mental health Artificial intelligence Glossary...

Word Count : 13105

Artificial intelligence

Last Update:

Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in...

Word Count : 22546

EleutherAI

Last Update:

EleutherAI is a "decentralized grassroots collective of volunteer researchers, engineers, and developers focused on AI alignment, scaling, and open-source AI...

Word Count : 2835

ELVIS Act

Last Update:

sector policies for artists in the era of artificial intelligence (AI) and AI alignment. It was noted as the first enacted legislation in the United States...

Word Count : 1040

Technology

Last Update:

agents. Within the field of AI ethics, significant yet-unsolved research problems include AI alignment (ensuring that AI behaviors are aligned with their...

Word Count : 10282

Human Compatible

Last Update:

intelligence (AI) is a serious concern despite the uncertainty surrounding future progress in AI. It also proposes an approach to the AI control problem...

Word Count : 1133

PDF Search Engine © AllGlobal.net