惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

SafeMind AI: Instant Health & Safety Intelligence What Is PKCE, How It Works & Flow Examples AI Agent Failure Modes Beyond Hallucination Fastest Way to Understand Stryker Solana Accounts Explained to a Web2 Developer TV Yayın Akışı Sitesi Geliştirirken Öğrendiğim Teknik Dersler $500 Challenge Drop How I use an LLM as a translation judge Best Calendar and Scheduling API for Developers — 2026 Comparison Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet — and What We Measured I Finished Machine Learning. And Then Changed The Plan. The Five-Thousand-Line File The AI Whirlwind: Why Your Local Agent Matters More Than Ever I Built an Oracle DBA That Lives in Telegram. It Cut a 500K-Row Scan to 5 - After Asking Permission. The Day 2 Reality of Running a Kubernetes Lab on Your Mac: Stop/Start, CKS Scenarios, and What I Learned Building It. n8n for Airtable Power Users: 5 Automations That Take Your Base to the Next Level Validating Gemma 4 for Industrial IoT: A Governance Pattern VS Code Now Credits Copilot on Every Commit by Default Astro and Islands Architecture: Why Your Portfolio Doesn't Need React for Everything Booting from FAT12: How I added file reading to my x86 kernel Unity’s AI agent went public: the developers of a static analysis tool on what that means for code quality Anna's Archive publica un llms.txt para los LLMs que rastrean su catálogo CRDTs for Offline-First Mobile Sync Why I Built Mneme HQ: Preventing AI Agent Architectural Drift Google Antigravity 2.0 Is the I/O 2026 Announcement You Should Actually Care About I Built a Pay-Per-Call Crypto Signal API with x402 — Heres the Architecture JWT Token Refresh Patterns in React 19: Avoiding the Silent Auth Death Spiral 🚀 “From Prompts to Autonomous Agents: What Google I/O 2026 Changed” The Power of Distributed Consensus in Autonomous SOCs Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern The Hardest Part of Being a Developer Isn't Coding Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it 25 React Interview Questions 2026 (With Answers) — Hooks, React 19, Concurrent Mode An open source LLM eval tool with two independent quality signals Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data Skills, Java 17, And Theme Accents 4 Hard Lessons on Optimizing AI Coding Agents Arctype: Cross-Platform Database GUI for LLM Artifacts Your robots.txt says GPTBot is welcome. Your server says 403. Organizing How to Use AWS Glue Workflow 5 n8n Automations Every Digital Agency Should Be Running (Bill More, Work Less) Getting Started with TorchGeo — Remote Sensing with PyTorch Designing a Scalable Cross-Platform Appium Framework Google Antigravity 2.0 & Slash Commands Building a Unified Adaptive Learning Intelligence with Gemma 4, Flutter, and Multi-Model Orchestration Looking for beta testers for a £60 server management application The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally. Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia AI Is Changing Engineering Culture More Than We Realize A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python
Introduction to Python for Data Analysis: A Beginner’s Guide
joseph mwang · 2026-05-15 · via DEV Community

Introduction

For a long time, I viewed programming as something reserved for software engineers and computer scientists. As someone with a background in scientific research and a growing interest in data analytics, I assumed tools like Excel, SQL, and Power BI were enough to answer most questions hidden in data.

Then I started learning Python, and what first looked like a programming language full of strange syntax quickly revealed itself as one of the most powerful tools a data analyst or data scientist can have. Python is not just about writing code; it is about automating repetitive tasks, cleaning messy datasets, analysing millions of records, and creating reproducible workflows that can be shared with anyone.

In this article, I share my beginner-friendly understanding of Python and how it is used in the data analytics space. If you are just starting your journey in data analysis, this guide will give you a practical overview of what Python is, why it matters, and the core concepts you need to know.

What Is Python?

Python is a high-level, general-purpose programming language known for its readability and simplicity.

It was created by Guido van Rossum and first released in 1991. The whole idea behind creating Python was that Code should be easy to read and easy to write.

Unlike many programming languages that require complex syntax, Python uses clear and concise statements that often resemble plain English.A better example is you can print hello data world and run to get the output.

a simple code

That single line displays text on the screen and demonstrates how approachable Python can be.

Why Python Is Important in Data Analysis

Python has become one of the most widely used languages in data analytics, data science, machine learning, and artificial intelligence. Its strength lies in its versatility.

1. Automating Repetitive Tasks

Data analysts often perform the same operations repeatedly:

  • Renaming hundreds of files
  • Cleaning dozens of spreadsheets
  • Downloading reports from APIs
  • Merging datasets

Python can automate these tasks.Let me give you a real-world scenario: Imagine receiving 200 CSV files from different branches every month. Opening and cleaning each file manually in Excel would take hours. With Python, a short script can process all files in seconds.

Python script for automating file processing

Python script for automating file processing

2. Handling Large and Complex Data

Excel becomes slow when datasets grow to hundreds of thousands or millions of rows.

Python, especially with the pandas library, can efficiently process large datasets and perform advanced transformations. Real-World Scenario
Analysing e-commerce transactions from Jumia or Amazon with millions of records is practical in Python but cumbersome in spreadsheets.

3. Advanced Data Cleaning

Real-world data is rarely perfect.

You may encounter:

  • Missing values
  • Duplicate records
  • Inconsistent text formats
  • Incorrect dates

Python provides tools to clean and standardize data systematically. Real-World Scenario: Converting NAIROBI, Nairobi, and nairobi into consistent values is a simple operation in Python.

Harmonizing columns names

4. Reproducibility

Every step of your analysis is stored in code.

This means:

  • Your work can be repeated
  • Errors can be traced
  • Colleagues can reproduce your results.

Python Basics Every Data Analyst Should Know

1.Variables

Variables store data values.

name = "Joseph"
age = 28

Enter fullscreen mode Exit fullscreen mode

name is the variable that stores the name Joseph and age is the variable that store the 28
Think of variables as labeled containers.

2.Data Types

Python supports several built-in data types.

Data_Type Example
Strings "Joseph
integer 28
Float 23.43
Boolean True/False

Demonstrating Python data types

3.Operators

Operators allow you to perform calculations and comparisons.

Arithmetic Operators

Arithmetic Operators

Comparison Operators

Comparison operators are used to compare two values:

Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x > y
< Less than x < y
>= Greater than or equal to x >= y
<= Less than or equal to x <= y

Logical Operators

Logical operators are used to combine conditional statements:

operator Description example
and returns true if both conditions are true x = 5, print(x<=5 and x < 10) (output:True)
or returns true if one of the conditions is true x = 5, print(x<4 and x < 10) (output:True)
not Reverse the result, returns False if the result is true x = 5,print(not(x > 3 and x < 10)) (output:False)

4.Data Structures

Lists
Lists are used to store multiple items in a single variable.
List items are ordered, changeable, and allow duplicate values.
List items are indexed; the first item has index [0], the second item has index [1], etc.
list uses square brackets [ ]

fruits = ["apple", "banana", "mango"]

Enter fullscreen mode Exit fullscreen mode

Tuples
Tuples are used to store multiple items in a single variable.
Tuple items are ordered, unchangeable, and allow duplicate values.
Tuples use parentheses ()

coordinates = (1.2, 3.4)

Enter fullscreen mode Exit fullscreen mode

Dictionaries
Dictionaries are used to store data values in (key: value) pairs.
A dictionary is a collection that is ordered, changeable, and does not allow duplicates.
Dictionary uses curly brackets {}

student = {"name": "Amina", "score": 90}

Enter fullscreen mode Exit fullscreen mode

Sets
A set is a collection that is unordered, unchangeable, and unindexed.
Sets are used to store multiple items in a single variable.
Sets cannot have two items with the same value.
Sets uses curly brackets {}

cities = {"Nairobi", "Mombasa", "Kisumu"}

Enter fullscreen mode Exit fullscreen mode

These structures help organise and manipulate data efficiently.

Conditional Statements

marks = 75

if marks >= 70:
    print("Pass")
else:
    print("Fail")

Enter fullscreen mode Exit fullscreen mode

For Loops

Loops repeat tasks automatically.

for number in range(1, 6):
    print(number)

Enter fullscreen mode Exit fullscreen mode

Real-World Scenario

Processing each row in a dataset or iterating through multiple files.

Functions

The functions package reusable logic.

def greet(name):
    return f"Hello, {name}!"

Enter fullscreen mode Exit fullscreen mode

Functions make code cleaner and easier to maintain.

Python Libraries for Data Analysis

One of Python's greatest strengths is its ecosystem of libraries.

Requests

requests url is used to interact with web APIs.

import requests

response = requests.get("https://dummyjson.com/products")
data = response.json()

Enter fullscreen mode Exit fullscreen mode

This is useful for collecting real-time data from online sources.

Pandas

pandas url is the most widely used library for data manipulation.

import pandas as pd
#loading an excel file into a notebook
df = pd.read_csv("sales.csv") 
df.head()

Enter fullscreen mode Exit fullscreen mode

import pandas as pd
data_json = data.json()        
#transforms a JSON file into a dataframe
df = pd.DataFrame(data_json[:100])
df

Enter fullscreen mode Exit fullscreen mode

With pandas, you can:

  • Load data
  • Filter rows
  • Handle missing values
  • Group and summarize
  • Merge datasets

For more information about pandas, refer to this video.
youtube link

<br>
Getting data URLs and loading a dataset with requests and pandas

Python Enhancement Proposals (PEP 8)

Like people, Python has its own likes and dislikes, its own "pet peeves". It likes clean indentation, meaningful variable names, and consistent formatting, and it dislikes messy spacing, unclear names, and poorly organized code. To help programmers understand what Python “prefers” and what it “dislikes,” the Python community created Python Enhancement Proposals (PEPs), with PEP 8 PEP providing the most widely used guidelines for writing readable and consistent code.

Indentation

Python uses indentation (typically 4 spaces) to define code blocks.

if True:
    print("Indented correctly")

Enter fullscreen mode Exit fullscreen mode

Line Length

Recommended maximum line length is 79 characters.

Naming Conventions

Variables and Functions: snake_case

total_sales = 500

def calculate_average():
    pass

Enter fullscreen mode Exit fullscreen mode

Classes: PascalCase

class StudentRecord:
    pass

Enter fullscreen mode Exit fullscreen mode

Constants: UPPER_CASE

PI = 3.14159

Enter fullscreen mode Exit fullscreen mode

Docstrings

Docstrings describe what a function does.

def add_numbers(a, b):
    """Return the sum of two numbers."""
    return a + b

Enter fullscreen mode Exit fullscreen mode

Docstrings are essential for writing maintainable code.

Final Thoughts

Python has shown me that data analysis is not just about creating charts or writing queries; it is about building repeatable processes that turn raw data into reliable insights.

Although I am still at the beginning of my learning journey, I can already see why Python has become such an essential tool for analysts and scientists. If you are starting out, focus on the fundamentals, practice consistently, and trust that each small script you write is another step toward becoming a more effective data professional.

A few weeks into this journey, I already understand why Python is considered the backbone of data science.

And this is only the beginning!