惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
D
Docker
博客园 - 聂微东
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园 - 叶小钗
李成银的技术随笔
Hugging Face - Blog
Hugging Face - Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
大猫的无限游戏
大猫的无限游戏
Jina AI
Jina AI
罗磊的独立博客
小众软件
小众软件
月光博客
月光博客
量子位
雷峰网
雷峰网
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
博客园 - Franky
The Cloudflare Blog
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
Last Week in AI
Last Week in AI
J
Java Code Geeks
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
宝玉的分享
宝玉的分享
H
Help Net Security
腾讯CDC
T
ThreatConnect
Cyberwarzone
Cyberwarzone
S
Securelist
A
Arctic Wolf
B
Blog
有赞技术团队
有赞技术团队
Y
Y Combinator Blog
Stack Overflow Blog
Stack Overflow Blog
A
About on SuperTechFans
F
Fox-IT International blog
P
Proofpoint News Feed
The Register - Security
The Register - Security
G
GRAHAM CLULEY
C
CXSECURITY Database RSS Feed - CXSecurity.com
阮一峰的网络日志
阮一峰的网络日志
P
Privacy & Cybersecurity Law Blog
美团技术团队
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
Security Latest
Security Latest
F
Full Disclosure
Recent Commits to openclaw:main
Recent Commits to openclaw:main
L
Lohrmann on Cybersecurity

WhatIs

The benefits of tape vs. disk backup for enterprise storage Comparing 7 storage network protocols How to ensure webcam functionality on remote desktops How AI innovation is driving network observability tool churn How to prevent employee burnout: The essentials 8 AI costs leaders don't always budget for but should SAN security best practices IT automation benefits: A strategic guide for IT leaders Configuring folder redirection works with roaming profiles How to build a business impact analysis checklist AI scribe note quality under question as adoption grows 5 steps of an ERP implementation communication plan What is a roaming profile, and how does it work? Turn data exhaust into business direction Choosing an IT process automation platform for 2026 How to address roaming profiles with GPOs Mitigating shadow AI use among clinicians as demand grows Using a citizen developer program to boost AI deployments Pennsylvania vs. Character.AI: What the lawsuit signals for the AI legal landscape How cohort analysis improves marketing decisions AI is fundamentally transforming organizations The impact of emerging technologies on SANs 6 ways to improve the CIO-CISO relationship in 2026 Cisco Live 2026 conference coverage and analysis | TechTarget What is a SAN? Ultimate storage area network guide How to plan business continuity activities, with a template Dell Technologies World 2026 news and conference coverage | TechTarget 4 common video conferencing problems and how to solve them Who owns software adoption after launch? Software change stalls when adoption begins AI assistants for IT ops: Select and govern operational bots Attackers targeting storage infrastructure for remote work Top 10 vector database use cases across industries Telehealth playbook aims to bolster rural hospitals amid funding crisis AI existential risk: Is AI a threat to humanity? Facing financial headwinds, health systems prioritize patient loyalty Georgia Tech builds network sandbox to test hospital cyber defenses 10 AI-driven network management tasks Data governance metrics: Measure success, identify issues Explore UCaaS architecture options and when to choose them ERP API integrations: Top challenges and tips for success 8 AI use cases in manufacturing Enterprises are making an AI native transformation Generative AI ethics: 16 biggest concerns and risks Zero trust in the IT ops stack: Securing hybrid workloads How algorithmic value sets enhance clinical decision-making Top methods for collecting customer feedback Understand greenhouse gas emissions vs. carbon emissions Build a data governance team that delivers results How to calculate the total cost of ownership of ERP software Communities call for transparency in AI data center deals Scalable IT infrastructure: Balancing speed with stability How health systems are tackling 'Kill the Clipboard' obstacles Understanding the science behind AI-based hiring assessments Tape's strategic role in modern data protection How to choose an HR software system in 2026: A complete guide The UC stack gets the policy job Strategic IT outlook: Tech conferences and events calendar Top zero-trust use cases in the enterprise 13 top IT infrastructure conferences in 2026 SNMP vs. CMIP: What's the difference? 3 essential network analytics use cases AI Security Risks Force CIOs to Rethink Strategy Red Hat Summit 2026 news and conference guide | TechTarget What is HR technology (human resources tech)? Understand, optimize and track customer journey touchpoints What are the most common authentication methods? Should IT use Apple Business Manager without MDM? Build and organize an effective machine learning team The storage modernization imperative in a fast-changing IT landscape Procurement automation use cases for CSCOs to consider 3 steps for health system leaders to drive patient safety culture What is DevOps? Meaning, methodology and guide Enterprises Face New Storage Bottlenecks as AI Grows A guide to Intune Suite licensing for endpoint management Epic controls 42% of the US EHR market. Does that help or hurt interoperability? SAP Sapphire 2026 news, trends and analysis | TechTarget How to develop a data governance strategy: 7 key steps 12 generative AI tools for marketing and sales teams Top 9 smart contract platforms to consider in 2026 Top 8 e-signature software providers for 2026 Rise with SAP vs. S/4HANA Cloud: What are the differences? How businesses use KPIs to measure AI's performance 5 clues your network has shadow AI How do digital signatures work? Collaboration security and governance must be proactive Compare SAP greenfield vs. brownfield approach for S/4HANA Merck, Home Depot tap Gemini Enterprise for AI agent development Rural challenges may dampen digital healthcare's potential Build an ethical AI framework: 12 top resources The great workload reshuffle: Choices for AI and analytics How to remove a device from Intune enrollment Cisco unveils quantum network advancements 3 BYOD security risks and how to prevent them 10 of the top carbon accounting software 8 trends powering machine learning's dynamic new roles Network engineers must take the lead to push DDI to the cloud How does Microsoft 365 Copilot pricing and licensing work? ONC highlights behavioral health EHR adoption trends, data exchange barriers LLMs struggle with clinical reasoning, study finds
10 big data challenges and how to address them
2026-05-23 · via WhatIs

A well-executed big data strategy helps enterprises improve operational performance, optimize marketing campaigns and prioritize product development plans. But data leaders face various challenges in advancing big data initiatives from boardroom discussions to successful deployments.

Data teams must work with IT to build an infrastructure that collects diverse data from numerous sources and makes it available for use in analytics and AI applications. They also need to ensure big data systems meet performance, scalability and timeliness requirements, with high data quality and strong data governance controls -- while also controlling implementation costs.

Perhaps most importantly, data leaders must engage with business executives to determine how big data can benefit the organization and align the strategy with key business goals and priorities.

Looking more deeply at these issues, here are 10 common big data challenges, along with advice on overcoming them.

1. Managing large volumes of data

Big data typically involves large volumes of data from disparate systems, applications and external sources. It also usually includes a mix of structured, unstructured and semistructured data, which is often created or updated at a fast pace. Managing this combination of volume, variety and velocity -- the traditional 3 V's of big data -- is inherently complicated.

That starts with extracting and consolidating relevant data from all the different sources -- CRM and ERP systems, website and application logs, sensors, social networks and more -- into a unified big data architecture. Such architectures commonly have been built on data lakes, scalable platforms that store diverse types of data. But Donald Farmer, principal at consulting firm TreeHive Strategy, said many data lakes are more like swamps, with sprawling data sets that are difficult to track and manage effectively.

Farmer added that newer data lakehouse platforms help ease those issues by combining the scalability and storage flexibility of data lakes with the more rigorous data management functions of traditional data warehouses. For example, he said Apache Iceberg and other open table formats provide transactional consistency and data versioning in data lakehouses, enabling data management teams to maintain audit trails and modify schemas without disrupting analytics and AI applications.

2. Finding and fixing data quality issues

Big data applications produce bad results when data quality issues affect systems. These issues become more significant -- and harder to address -- as data management and analytics teams ingest more and more data. Monitoring data quality, identifying problems and fixing them is a continuous process, Bunddler CEO Paul Kovalenko said.

Bunddler, an online marketplace for finding shopping assistants who help people buy products and arrange international shipments, experienced that firsthand as it scaled to 500,000 customers. The New York-based company uses big data to provide a highly personalized UX, monitor trends and identify upselling opportunities for assistants, but effective data quality management is a pressing concern.

Duplicate entries and typos are common in the data Bunddler collects from various sources, Kovalenko said. To root out such problems, it created a tool that matches duplicates with minor data differences and flags potential typos. Higher-quality data from using the tool has increased the accuracy of analytics insights, he said.

AI can also help organizations improve data quality: It's increasingly being used to validate data and detect anomalies, errors, inconsistencies and other quality issues.

3. Dealing with data integration complexities

While big data platforms enable organizations to collect and store large amounts of varied data, the data collection process is challenging, said Rosaria Silipo, a data scientist, author and co-host of the "My Data Guest" podcast. In particular, integrating sets of big data is more complex than conventional data integration due to the different types of data involved and the fast pace of updates.

Data leaders and teams need to think through their organization's data integration requirements upfront. Ad hoc integration for specific projects often results in redundant efforts and substantial rework of integration scripts or routines, Silipo said. Optimizing the ROI of big data investments requires a strategic approach to data integration, she added.

That typically involves extract, load and transform (ELT) processes rather than the traditional ETL ones used in data warehouses. ELT loads data into a data lake or lakehouse in its native format, then combines and transforms it as needed for specific use cases. Real-time integration is also common in big data environments, and the growing adoption of AI tools and agents is accelerating a shift from rigid data pipelines to flexible architectures that deliver data to applications more dynamically.

4. Scaling big data systems efficiently and cost-effectively

Enterprises waste a lot of money collecting and storing big data if they don't have scalable systems capable of handling both current and future processing workloads. As a result, data teams should map out planned uses and required data types and schemas before designing and deploying big data systems.

But that's easier said than done, said Travis Rehl, CTO and head of product at data, AI and cloud services provider Innovative Solutions. "Oftentimes, you start from one data model and expand out, but quickly realize the model doesn't fit your new data points -- and you suddenly have technical debt you need to resolve," Rehl said.

Appropriate data structures make it easier to reuse data efficiently. For example, Parquet files often provide a better performance-to-cost ratio than CSV dumps within a data lake or lakehouse. Consistent retention policies cycle out old data from repositories as its analytics value erodes. When latency is an issue, teams also need to consider whether to run systems in the cloud, in on-premises data centers or on edge servers, while balancing performance with deployment and management costs.

5. Evaluating and selecting big data technologies

Data leaders and their teams can choose from a wide range of big data technologies that often overlap in capabilities. Both open source tools and commercial platforms are available, further complicating the evaluation and selection process. Making the right choices is critical to gaining the expected business benefits from big data initiatives.

To help inform technology decisions, teams should consider current and future data needs for both batch processing and real-time streaming from different sources. The data preparation capabilities required to support AI, machine learning and other advanced analytics applications should also be assessed, as well as where data will be processed and stored. The ability to easily update analytics and AI models in data platforms is another key consideration.

6. Generating valuable business insights

The volume and complexity of big data complicate efforts to analyze and use it. Organizations often struggle to generate valuable insights and apply them in business operations in an impactful way, said Bill Szybillo, manager of BI engineering at firearms maker Sig Sauer Inc.

Doing so requires a clear understanding of the data's business context and potential use cases. But Silipo said she has found that many data leaders and teams focus on the technology and pay less attention to how big data systems can be used to achieve desired business outcomes.

Teams that don't work with the people closest to business problems when planning data platforms, pipelines and storage architectures might build technically sound systems that produce little business value. Pilot projects are useful not only for engaging business users from the start, but also for surfacing limitations early on in big data initiatives and delivering some quick wins to demonstrate business benefits.

7. Hiring and retaining workers with big data skills

Finding workers with the required skills is another common challenge -- and growing AI use adds new requirements for expertise in designing, training and supervising AI models. But data scientists and other analytics professionals with AI skills are in high demand, as are data engineers and workers skilled in deploying and managing data platforms.

In addition, technical skills alone aren't enough. Data teams also must be able to identify risks, manage internal expectations and resolve issues, said Pablo Listingart, founder and executive director of ComIT and Comunidad IT, charitable organizations that provide free IT training programs in Canada and Argentina. "Many big data initiatives fail because of incorrect expectations and faulty estimations that are carried forward from the beginning of the project to the end," he noted.

Vojtech Kurka, co-founder and head of R&D at customer data platform vendor Meiro, said creating the right culture helps attract and retain skilled workers. Kurka initially thought Meiro could solve its data problems with simple SQL and Python scripts. But he later realized that to meet its goals, the company needed to hire people with more advanced data skills and keep them satisfied and motivated.

Organizations can also partner with providers of AI, analytics, data management and software development services to fill big data skills gaps. In some cases, that's faster and less expensive than hiring new employees. But data leaders should carefully evaluate a provider's costs and capabilities and assess whether internal hiring is a better long-term option.

8. Keeping costs from getting out of control

Another common challenge is avoiding what David Mariani, founder and CTO of semantic layer platform vendor AtScale, called the "cloud bill heart attack."

Many enterprises use existing data consumption metrics to estimate the computing costs of new big data infrastructure, but expanded access to richer, more granular data sets often increases user demand for computing resources. Cloud systems that elastically scale to handle higher data processing and analysis workloads will drive up costs unexpectedly if companies underestimate their resource needs.

On-demand pricing models can also increase costs if the use of big data systems isn't managed effectively. Fixed-resource pricing alleviates that problem, but doesn't completely solve it: Poorly written applications that consume excessive resources block other workloads from running if the specified usage limit is reached. "I've seen several customers where users have written $10,000 queries due to poorly designed SQL," Mariani said. Data teams need to implement fine-grained query controls to prevent that.

Rehl said data leaders should also raise the cost issue upfront with business and data engineering teams when planning big data deployments to ensure organizations budget appropriately for required computing resources and include effective cost controls.

9. Governing big data environments

Without effective data governance, "much of the benefit of broader, deeper data access can be lost," Mariani said. But data governance issues become harder to address as big data applications expand across systems. Cloud architectures that make it more feasible for enterprises to collect and store ever-increasing volumes of raw, unaggregated data compound governance challenges.

Lax data governance reduces the accuracy of analytics and AI outputs and allows protected information to creep into applications that shouldn't include it, creating compliance risks. In addition to the data protection and privacy laws that mandate strong governance, AI regulations are becoming a factor, Farmer said. For example, under the EU AI Act, qualifying organizations deploying AI systems classified as high-risk must meet a set of data governance and management requirements starting in August 2026.

Investing time upfront to identify and manage big data governance issues makes it easier to provide self-service data access without requiring direct oversight of each new use case. Treating data as a product with built-in governance rules also helps prevent usage and compliance issues.

10. Ensuring that AI tools produce trustworthy results

Generative AI (GenAI) and agentic AI tools amplify data management and governance issues in big data systems. For example, AI agents configured to autonomously monitor, analyze and act on data can create cascading errors and compliance problems without proper oversight.

Comprehensive training and ongoing supervision are required to ensure that AI's actions are accurate, unbiased and trustworthy, said Michael O'Malley, senior vice president of strategy and growth at Customer Analytics LLC, an AI, analytics and data engineering services provider. "Agents and generative AI are powerful tools," O'Malley said. "But just owning an expensive hammer doesn't make you a master carpenter."

Data quality is also a key consideration: An AI agent is only as reliable as the data it analyzes, Silipo noted. In addition, the models that underpin GenAI and agentic AI tools must be updated when new business trends or scenarios inevitably emerge. Otherwise, the tools won't be able to adapt, leading to flawed analytics and actions.

George Lawton is a journalist based in London. Over the last 30 years, he has written more than 3,000 stories about computers, communications, knowledge management, business, health and other areas that interest him.