惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Threat Research - Cisco Blogs
S
Securelist
H
Heimdal Security Blog
Scott Helme
Scott Helme
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Spread Privacy
Spread Privacy
Cyberwarzone
Cyberwarzone
V
Vulnerabilities – Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
C
CERT Recently Published Vulnerability Notes
P
Proofpoint News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
人人都是产品经理
人人都是产品经理
C
Cisco Blogs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Engineering at Meta
Engineering at Meta
Project Zero
Project Zero
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
有赞技术团队
有赞技术团队
T
Tailwind CSS Blog
Cisco Talos Blog
Cisco Talos Blog
Last Week in AI
Last Week in AI
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
O
OpenAI News
P
Proofpoint News Feed
Google Online Security Blog
Google Online Security Blog
Recent Announcements
Recent Announcements
Hacker News: Ask HN
Hacker News: Ask HN
美团技术团队
Stack Overflow Blog
Stack Overflow Blog
U
Unit 42
P
Privacy International News Feed
Google DeepMind News
Google DeepMind News
G
GRAHAM CLULEY
Apple Machine Learning Research
Apple Machine Learning Research
TaoSecurity Blog
TaoSecurity Blog
S
Security @ Cisco Blogs
C
Check Point Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Jina AI
Jina AI
S
Secure Thoughts
G
Google Developers Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
L
LINUX DO - 最新话题
T
Tenable Blog
Latest news
Latest news
I
InfoQ

cs.IR updates on arXiv.org

From Top-1 to Top-K: A Reproducibility Study and Benchmarking of Counterfactual Explanations for Recommender Systems Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI Diagnosable ColBERT: Debugging Late-Interaction Retrieval Models Using a Learned Latent Space as Reference Enhancing Unsupervised Keyword Extraction in Academic Papers through Integrating Highlights with Abstract CAST: Modeling Semantic-Level Transitions for Complementary-Aware Sequential Recommendation IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora Personalized Benchmarking: Evaluating LLMs by Individual Preferences Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations JFinTEB: Japanese Financial Text Embedding Benchmark UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval SIMMER: Cross-Modal Food Image--Recipe Retrieval via MLLM-Based Embedding Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels Learning Behaviorally Grounded Item Embeddings via Personalized Temporal Contexts Collaborative Filtering Through Weighted Similarities of User and Item Embeddings IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning Metric-agnostic Learning-to-Rank via Boosting and Rank Approximation GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG NewsTorch: A PyTorch-based Toolkit for Learner-oriented News Recommendation Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model Evaluation of Agents under Simulated AI Marketplace Dynamics Driving Engagement in Daily Fantasy Sports with a Scalable and Urgency-Aware Ranking Engine TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines Indexing Multimodal Language Models for Large-scale Image Retrieval FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents Knowledge Graph RAG: Agentic Crawling and Graph Construction in Enterprise Documents NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification MOSAIC: Multi-Domain Orthogonal Session Adaptive Intent Capture for Prescient Recommendations Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions PriHA: A RAG-Enhanced LLM Framework for Primary Healthcare Assistant in Hong Kong Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA MAB-DQA: Addressing Query Aspect Importance in Document Question Answering with Multi-Armed Bandits PRAGMA: Revolut Foundation Model Rag Performance Prediction for Question Answering Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search Evaluating Scene-based In-Situ Item Labeling for Immersive Conversational Recommendation Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model SocialWise: LLM-Agentic Conversation Therapy for Individuals with Autism Spectrum Disorder to Enhance Communication Skills Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval Spectral Tempering for Embedding Compression in Dense Passage Retrieval AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents LiveGraph: Active-Structure Neural Re-ranking for Exercise Recommendation GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Exploring Structural Complexity in Normative RAG with Graph-based approaches: A case study on the ETSI Standards SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval SemaCDR: LLM-Powered Transferable Semantics for Cross-Domain Sequential Recommendation Beyond Offline A/B Testing: Context-Aware Agent Simulation for Recommender System Evaluation AI-assisted Protocol Information Extraction For Improved Accuracy and Efficiency in Clinical Trial Workflows Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking When & How to Write for Personalized Demand-aware Query Rewriting in Video Search Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows WisPaper: Your AI Scholar Search Engine GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs Hierarchical Semantic Retrieval with Cobweb WARBERT: A Hierarchical BERT-based Model for Web API Recommendation Reliable Evaluation Protocol for Low-Precision Retrieval VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context From Limited Labels to Open Domains:An Efficient Learning Method for Drone-view Geo-Localization User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation PoTable: Towards Systematic Thinking via Plan-then-Execute Stage Reasoning on Tables An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
Extraction of Product Specifications from the Web -- Going Beyond Tables and Lists
Govind Krishnan Gangadhar, Ashish Kulkarni · 2022-01-09 · via cs.IR updates on arXiv.org

E-commerce product pages on the web often present product specification data in structured tabular blocks. Extraction of these product attribute-value specifications has benefited applications like product catalogue curation, search, question answering, and others. However, across different Websites, there is a wide variety of HTML elements (like <table>, <ul>, <div>, <span>, <dl> etc.) typically used to render these blocks that makes their automatic extraction a challenge. Most of the current research has focused on extracting product specifications from tables and lists and, therefore, suffers from recall when applied to a large-scale extraction setting. In this paper, we present a product specification extraction approach that goes beyond tables or lists and generalizes across the diverse HTML elements used for rendering specification blocks. Using a combination of hand-coded features and deep learned spatial and token features, we first identify the specification blocks on a product page. We then extract the product attribute-value pairs from these blocks following an approach inspired by wrapper induction. We created a labeled dataset of product specifications extracted from 14,111 diverse specification blocks taken from a range of different product websites. Our experiments show the efficacy of our approach compared to the current specification extraction models and support our claim about its application to large-scale product specification extraction.