A framework to understand our reliance on AI in decision-making.
― 6 min read
Cutting edge science explained simply
A framework to understand our reliance on AI in decision-making.
― 6 min read
This paper examines prompt injections and their implications for AI models.
― 3 min read
Examining how summarization models reflect bias in political opinions.
― 7 min read
This study analyzes how LLMs can forecast AI's possible harms.
― 7 min read
Examining harm amplification in text-to-image models and its societal impact.
― 6 min read
New framework helps generative models forget sensitive data while maintaining performance.
― 8 min read
This study investigates jailbreaking attacks on multimodal large language models.
― 6 min read
Investigating security risks and detection methods for diffusion models.
― 6 min read
Examining how machine learning perpetuates gender biases and their emotional effects.
― 5 min read
Examining the relationship between data protection laws and machine learning practices.
― 6 min read
Exploring methods to safeguard individual data in an information-driven world.
― 5 min read
Exploring how friction can enhance user experiences in AI.
― 10 min read
An analysis of the qualities and challenges of language model explanations.
― 5 min read
Examining the limitations of LLMs in understanding and retaining temporal information.
― 4 min read
New watermarking methods improve text variety and detection in machine-generated content.
― 7 min read
An analysis of how attention is captured and its impact on society.
― 8 min read
This article examines the dangers of harmful fine-tuning in language models.
― 7 min read
New methods secure data in AI while ensuring effective computations.
― 5 min read
A method to remove unwanted skills from language models while keeping essential functions intact.
― 6 min read
A new benchmark aims to measure and mitigate AI-related dangers.
― 5 min read
A framework to evaluate biases in recommendations generated by large language models.
― 5 min read
Methods to minimize bias in large language models for fairer outcomes.
― 7 min read
This paper analyzes gender bias in large language models and proposes measurement methods.
― 7 min read
Evaluating how biases in language models affect real-world applications.
― 5 min read
New model creates lifelike images from identity features using machine learning.
― 4 min read
Exploring the key traits and challenges of developing reliable AI systems.
― 5 min read
A study of techniques used to bypass safety measures in AI language models.
― 8 min read
A study that measures political bias in large language models through stance and framing.
― 7 min read
A closer look at sparse feature circuits in language models and their implications.
― 9 min read
Exploring the importance of understandable reasoning in AI predictions.
― 6 min read
A framework to improve the safety and reliability of large language models.
― 7 min read
Exploring the role of ethics in language translation technology.
― 5 min read
Examining how machine unlearning can expose sensitive data.
― 8 min read
ALERT benchmark assesses safety risks in language models to improve their responses.
― 4 min read
A new tool for making images safer and more responsible.
― 7 min read
Research investigates biases in Turkish language models and their societal impacts.
― 8 min read
Learn how Context Steering enhances language model responses through adaptable context use.
― 8 min read
Recent research challenges the simplicity of the Knowledge Neuron Thesis in language models.
― 10 min read
Research tackles privacy concerns in language models through innovative unlearning methods.
― 6 min read
Research reveals bias in AI tools used for hiring based on race and gender.
― 6 min read