惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Recent Commits to openclaw:main
Recent Commits to openclaw:main
SecWiki News
SecWiki News
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
O
OpenAI News
N
News and Events Feed by Topic
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
W
WeLiveSecurity
Google Online Security Blog
Google Online Security Blog
I
Intezer
Forbes - Security
Forbes - Security
Latest news
Latest news
E
Exploit-DB.com RSS Feed
Simon Willison's Weblog
Simon Willison's Weblog
S
Schneier on Security
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Help Net Security
Help Net Security
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Know Your Adversary
Know Your Adversary
H
Heimdal Security Blog
H
Hacker News: Front Page
TaoSecurity Blog
TaoSecurity Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Tor Project blog
K
Kaspersky official blog
The Hacker News
The Hacker News
T
Threatpost
Scott Helme
Scott Helme
N
News and Events Feed by Topic
Security Latest
Security Latest
L
LINUX DO - 热门话题
S
Security @ Cisco Blogs
Project Zero
Project Zero
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
T
Threat Research - Cisco Blogs
T
Tenable Blog
L
LINUX DO - 最新话题
Hacker News: Ask HN
Hacker News: Ask HN
A
Arctic Wolf
P
Privacy & Cybersecurity Law Blog
C
Cisco Blogs
Hugging Face - Blog
Hugging Face - Blog
N
News | PayPal Newsroom
The Last Watchdog
The Last Watchdog
T
Troy Hunt's Blog
NISL@THU
NISL@THU

cs.CL updates on arXiv.org

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations Self-Calibrating Language Models via Test-Time Discriminative Distillation HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Weird Generalization is Weirdly Brittle Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Relational Probing: LM-to-Graph Adaptation for Financial Prediction CodeComp: Structural KV Cache Compression for Agentic Coding FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness Comparative Analysis of Large Language Models in Healthcare Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation A Structured Clustering Approach for Inducing Media Narratives NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection Turing or Cantor: That is the Question NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval BlasBench: An Open Benchmark for Irish Speech Recognition OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Evaluating Memory Capability in Continuous Lifelog Scenario Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference What Factors Affect LLMs and RLLMs in Financial Question Answering? KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling Preference Learning Unlocks LLMs' Psycho-Counseling Skills Aligning What LLMs Do and Say: Towards Self-Consistent Explanations StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation GenProve: Learning to Generate Text with Fine-Grained Provenance ChemPro: A Progressive Chemistry Benchmark for Large Language Models C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts Evaluating Cooperation in LLM Social Groups through Elected Leadership A Triadic Suffix Tokenization Scheme for Numerical Reasoning Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Anthropogenic Regional Adaptation in Multimodal Vision-Language Model METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Linear Representations of Hierarchical Concepts in Language Models TInR: Exploring Tool-Internalized Reasoning in Large Language Models Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning The Amazing Agent Race: Strong Tool Users, Weak Navigators Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities CircuitSynth: Reliable Synthetic Data Generation ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Cross-Cultural Value Awareness in Large Vision-Language Models Should We be Pedantic About Reasoning Errors in Machine Translation? Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards COMPOSITE-Stem GIANTS: Generative Insight Anticipation from Scientific Literature Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Generating High Quality Synthetic Data for Dutch Medical Conversations Generative UI: LLMs are Effective UI Generators LETGAMES: An LLM-Powered Gamified Approach to Cognitive Training for Patients with Cognitive Impairment Seven simple steps for log analysis in AI systems H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration LABBench2: An Improved Benchmark for AI Systems Performing Biology Research Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval Reasoning Models Will Sometimes Lie About Their Reasoning Disco-RAG: Discourse-Aware Retrieval-Augmented Generation Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models Echoes of Automation: The Increasing Use of LLMs in Newsmaking
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xian · 2023-05-13 · via cs.CL updates on arXiv.org

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.