Whether you’re a student, data enthusiast, or aspiring data scientist, learning how to analyze data using Python is a powerful skill. In this article, you’ll learn the basics of data analysis using Python’s most popular library: Pandas.
🧠 What Is Pandas?
Pandas is a Python library used to manipulate and analyze structured data. It makes working with tables, CSV files, and large datasets simple and efficient.
Key features:
- DataFrames (like Excel tables in code)
- Easy data filtering, aggregation, and transformation
- Powerful tools for cleaning and exploring data
🛠️ Getting Started: Install Pandas
Before using Pandas, install it using pip:
pip install pandas
📥 Step 1: Load Your Data
Pandas works best with CSV (comma-separated values) files. Here’s how to load one:
import pandas as pd
# Load a CSV file
df = pd.read_csv("data.csv")
# Show the first few rows
print(df.head())
🔍 df
is your DataFrame, like a table. Use head()
to preview the data.
🔎 Step 2: Explore the Data
# Basic info
print(df.info())
# Quick statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
You’ll see:
- Column names & data types
- Count, mean, std deviation, min/max
- How many missing values are in each column
🧹 Step 3: Clean the Data
Remove rows with missing values:
df_clean = df.dropna()
Or fill missing values:
df['Age'] = df['Age'].fillna(df['Age'].mean())
Rename columns for clarity:
df.rename(columns={'old_name': 'new_name'}, inplace=True)
📊 Step 4: Analyze the Data
Filter Rows
# Filter people older than 30
filtered = df[df['Age'] > 30]
print(filtered)
Group and Aggregate
# Average salary by department
avg_salary = df.groupby('Department')['Salary'].mean()
print(avg_salary)
📈 Step 5: Visualize (Optional)
Use Pandas with Matplotlib for simple charts:
import matplotlib.pyplot as plt
# Histogram of ages
df['Age'].hist()
plt.show()
🚀 Summary: Data Analysis Workflow with Pandas
- Import data with
read_csv()
- Explore the structure with
info()
anddescribe()
- Clean missing values and fix column names
- Analyze with filtering and grouping
- Visualize to find trends
🔗 Final Tips
- Always check data types (
df.dtypes
) - Use
.value_counts()
to count categories - Document your steps to stay organized
🧪 Sample Dataset
You can try with any open-source dataset from:
- Kaggle.com
- data.gov
- Or create your own CSV file