惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
T
Troy Hunt's Blog
The Register - Security
The Register - Security
量子位
Hugging Face - Blog
Hugging Face - Blog
T
Tailwind CSS Blog
I
InfoQ
B
Blog RSS Feed
酷 壳 – CoolShell
酷 壳 – CoolShell
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
Visual Studio Blog
博客园 - Franky
H
Hackread – Cybersecurity News, Data Breaches, AI and More
C
Check Point Blog
A
About on SuperTechFans
S
SegmentFault 最新的问题
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
Last Week in AI
Last Week in AI
罗磊的独立博客
Y
Y Combinator Blog
U
Unit 42
The Cloudflare Blog
T
The Blog of Author Tim Ferriss
月光博客
月光博客
GbyAI
GbyAI
博客园 - 三生石上(FineUI控件)
IT之家
IT之家
N
Netflix TechBlog - Medium
Cyberwarzone
Cyberwarzone
Vercel News
Vercel News
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Tor Project blog
博客园 - 叶小钗
大猫的无限游戏
大猫的无限游戏
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
aimingoo的专栏
aimingoo的专栏
The Hacker News
The Hacker News
Recent Announcements
Recent Announcements
博客园_首页
有赞技术团队
有赞技术团队
Jina AI
Jina AI
Simon Willison's Weblog
Simon Willison's Weblog
雷峰网
雷峰网
人人都是产品经理
人人都是产品经理
S
Schneier on Security
Spread Privacy
Spread Privacy
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog

cs.LG updates on arXiv.org

Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning Does Dimensionality Reduction via Random Projections Preserve Landscape Features? Analog Optical Inference on Million-Record Mortgage Data ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection Context Sensitivity Improves Human-Machine Visual Alignment Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery Rethinking Uncertainty in Segmentation: From Estimation to Decision A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction Depth-Resolved Coral Reef Thermal Fields from Satellite SST and Sparse In-Situ Loggers Using Physics-Informed Neural Networks Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks Spectral Entropy Collapse as a Phase Transition in Delayed Generalisation: An Interventional and Predictive Framework for Grokkin LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling Evaluating Cooperation in LLM Social Groups through Elected Leadership Towards Autonomous Mechanistic Reasoning in Virtual Cells Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification A Triadic Suffix Tokenization Scheme for Numerical Reasoning Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference From Attribution to Action: A Human-Centered Application of Activation Steering THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture Cost-optimal Sequential Testing via Doubly Robust Q-learning Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net A Faster Path to Continual Learning Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR Optimal Stability of KL Divergence under Gaussian Perturbations Memory-Guided Trust-Region Bayesian Optimization (MG-TuRBO) for High Dimensions EngageTriBoost: Predictive Modeling of User Engagement in Digital Mental Health Intervention Using Explainable Machine Learning Reservoir observer enhanced with residual calibration and attention mechanism Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits Efficient RL Training for LLMs with Experience Replay Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need Adversarial Sensor Errors for Safe and Robust Wind Turbine Fleet Control IKKA: Inversion Classification via Critical Anomalies for Robust Visual Servoing Adaptive Simulation Experiment for LLM Policy Optimization EvoLen: Evolution-Guided Tokenization for DNA Language Model Smartwatch-Based Sitting Time Estimation in Real-World Office Settings Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis Loom: A Scalable Analytical Neural Computer Architecture Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance Finite-Sample Analysis of Nonlinear Independent Component Analysis:Sample Complexity and Identifiability Bounds How does Chain of Thought decompose complex tasks? Uncertainty-Aware Transformers: Conformal Prediction for Language Models Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Delve into the Applicability of Advanced Optimizers for Multi-Task Learning Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication Predictive Entropy Links Calibration and Paraphrase Sensitivity in Medical Vision-Language Models Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge Feature-Label Modal Alignment for Robust Partial Multi-Label Learning Integrated electro-optic attention nonlinearities for transformers Toward World Models for Epidemiology Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection Policy-Aware Design of Large-Scale Factorial Experiments Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem Mini-Batch Covariance, Diffusion Limits, and Oracle Complexity in Stochastic Gradient Descent: A Sampling-Design Perspective A Quantitative Definition of Intelligence SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense The Amazing Agent Race: Strong Tool Users, Weak Navigators MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions COMPOSITE-Stem SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories PhysInOne: Visual Physics Learning and Reasoning in One Suite FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval Detecting Diffusion-generated Images via Dynamic Assembly Forests PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection Hypergraph Neural Networks Accelerate MUS Enumeration ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout
Privacy-Preserving Record Linkage for Cardinality Counting
Nan Wu, Dinusha Vatsalan, Mohamed Ali Kaafar, Sanath Kumar Rames · 2023-01-09 · via cs.LG updates on arXiv.org

Several applications require counting the number of distinct items in the data, which is known as the cardinality counting problem. Example applications include health applications such as rare disease patients counting for adequate awareness and funding, and counting the number of cases of a new disease for outbreak detection, marketing applications such as counting the visibility reached for a new product, and cybersecurity applications such as tracking the number of unique views of social media posts. The data needed for the counting is however often personal and sensitive, and need to be processed using privacy-preserving techniques. The quality of data in different databases, for example typos, errors and variations, poses additional challenges for accurate cardinality estimation. While privacy-preserving cardinality counting has gained much attention in the recent times and a few privacy-preserving algorithms have been developed for cardinality estimation, no work has so far been done on privacy-preserving cardinality counting using record linkage techniques with fuzzy matching and provable privacy guarantees. We propose a novel privacy-preserving record linkage algorithm using unsupervised clustering techniques to link and count the cardinality of individuals in multiple datasets without compromising their privacy or identity. In addition, existing Elbow methods to find the optimal number of clusters as the cardinality are far from accurate as they do not take into account the purity and completeness of generated clusters. We propose a novel method to find the optimal number of clusters in unsupervised learning. Our experimental results on real and synthetic datasets are highly promising in terms of significantly smaller error rate of less than 0.1 with a privacy budget ε = 1.0 compared to the state-of-the-art fuzzy matching and clustering method.