Skip to content
Subin Thapa

Subin Thapa

  • Home
  • About
  • Service
  • Portfolio
  • Blog
  • Contact
Schedule Meeting

Pandas Complete Notes (Zero to Advanced – Full Syllabus)

subinthapaJanuary 27, 2026January 27, 2026 No Comments
subinthapa_pandas_notes

These notes fully cover all Pandas topics and subtopics required for:

  • Python syllabus
  • Data Science
  • Machine Learning preprocessing
  • Exams and interviews

1. Pandas Introduction

What is Pandas

Pandas is an open-source Python library used for data manipulation and data analysis. It provides fast, flexible, and expressive data structures.

Why Pandas is Used

  • Handling structured data (tabular, time-series)
  • Cleaning real-world datasets
  • Exploratory Data Analysis (EDA)
  • Data preprocessing for ML models

Pandas vs NumPy

  • NumPy: numerical arrays (homogeneous)
  • Pandas: labeled data (heterogeneous)

2. Pandas Getting Started

Installation

pip install pandas

Importing Pandas

import pandas as pd
import numpy as np

Check Version

pd.__version__

3. Pandas Data Structures

3.1 Series

A Series is a one-dimensional labeled array capable of holding any data type.

s = pd.Series([10, 20, 30], index=["a", "b", "c"])

Series Attributes

s.values
s.index
s.dtype
s.name

Series Methods

s.head()
s.tail()
s.sum()
s.mean()

3.2 DataFrame

A DataFrame is a two-dimensional labeled data structure with rows and columns.

df = pd.DataFrame(data)

DataFrame Inspection

df.head()
df.tail()
df.shape
df.columns
df.dtypes
df.info()
df.describe()

4. Reading and Writing Data

Read CSV

pd.read_csv("file.csv")

Read Excel

pd.read_excel("file.xlsx")

Read JSON

pd.read_json("file.json")

Write Files

df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)

5. Selecting, Indexing, and Filtering

Column Selection

df["age"]
df[["name", "age"]]

Row Filtering

df[df["age"] > 18]

Boolean Conditions

(df["age"] > 18) & (df["age"] < 60)

isin()

df[df["city"].isin(["KTM", "BHW"])]

loc and iloc

df.loc[0:2, ["name", "age"]]
df.iloc[0:3, 0:2]

query()

df.query("grade == 10 and city == 'KTM'")

6. Data Analysis (EDA)

value_counts

df["city"].value_counts()

unique and nunique

df["city"].unique()
df["city"].nunique()

GroupBy

df.groupby("city")["hours_studied"].mean()

Aggregation

df.groupby("grade").agg(
    avg_age=("age", "mean"),
    count_students=("student_id", "count")
)

7. Cleaning Data

7.1 Detect Missing Values

df.isna()
df.isna().sum()

7.2 Cleaning Empty Cells

Drop missing values

df.dropna()

Fill missing values

df["age"].fillna(df["age"].median())
df["city"].fillna("Unknown")

7.3 Cleaning Wrong Format

Convert to datetime

df["exam_date"] = pd.to_datetime(df["exam_date"])

Convert data types

df["grade"] = df["grade"].astype(int)

7.4 Cleaning Wrong Data

df = df[df["age"] > 0]
df["passed"] = df["passed"].replace({"Yes": "yes", "No": "no"})

7.5 Removing Duplicates

df.duplicated()
df.drop_duplicates()

8. Sorting and Sampling

df.sort_values("hours_studied", ascending=False)
df.sample(n=3, random_state=42)

9. Data Type Handling

astype

df["grade"] = df["grade"].astype(int)

select_dtypes

df.select_dtypes(include=["number"])

10. Categorical Data Handling

map

df["passed"] = df["passed"].map({"yes": 1, "no": 0})

replace

df["city"] = df["city"].replace({"KTM": "Kathmandu"})

One-Hot Encoding

pd.get_dummies(df, columns=["city"], drop_first=True)

11. String Operations

df["name"].str.upper()
df["city"].str.contains("K")
df.columns = df.columns.str.upper()

12. Datetime Operations

df["year"] = df["exam_date"].dt.year
df["month"] = df["exam_date"].dt.month

13. Merge and Concatenate

concat

pd.concat([df1, df2], ignore_index=True)

merge

pd.merge(df, cities, on="city", how="left")

14. Correlation

df.corr(numeric_only=True)

15. Pandas Plotting

df["age"].plot(kind="hist")
df.plot(x="age", y="hours_studied", kind="scatter")

16. Performance Optimization

Avoid loops

for _, row in df.iterrows():
    pass

Use vectorization

df["age"] = df["age"] * 2

17. Pandas for Machine Learning

Feature and Target Split

X = df[["age", "grade", "hours_studied"]]
y = df["passed"]

ML Checklist

  • No missing values
  • Numeric features
  • Encoded categorical data
  • Correct data types

18. Common Interview and Exam Questions

  • Difference between Series and DataFrame
  • dropna vs fillna
  • loc vs iloc
  • map vs replace
  • groupby use cases

19. Pandas Study Plan

Day 1: Basics, Series, DataFrame
Day 2: Indexing and Filtering
Day 3: Cleaning Data
Day 4: GroupBy and Aggregation
Day 5: Encoding and Correlation
Day 6: Plotting and Performance
Day 7: ML Data Preparation


20. Pandas Practice Questions (50 Questions)

These 50 questions are carefully selected to cover the entire Pandas syllabus needed for Data Science, exams, and interviews.


A. Basics (1–10)

  1. What is Pandas and why is it used in Data Science?
  2. Difference between Pandas and NumPy?
  3. What is a Series?
  4. What is a DataFrame?
  5. How do you check Pandas version?
  6. How to create a DataFrame from a dictionary?
  7. Difference between head() and tail()?
  8. What does df.shape return?
  9. Difference between df.info() and df.describe()?
  10. What data types does Pandas support?

B. Indexing & Selection (11–20)

  1. Difference between loc and iloc?
  2. How do you select multiple columns?
  3. How do you filter rows using conditions?
  4. Difference between & and and in Pandas?
  5. What is Boolean indexing?
  6. What does isin() do?
  7. How does query() work?
  8. How to select first 5 rows of a DataFrame?
  9. How to select last 3 columns?
  10. How to reset index?

C. Cleaning Data (21–30)

  1. What is NaN?
  2. How to detect missing values?
  3. Difference between isna() and isnull()?
  4. When should you use dropna()?
  5. When should you use fillna()?
  6. How to fill missing values with mean?
  7. How to clean wrong data types?
  8. How to remove duplicate rows?
  9. How to replace wrong values in a column?
  10. How to convert string date to datetime?

D. Data Analysis & GroupBy (31–40)

  1. What is value_counts() used for?
  2. Difference between unique() and nunique()?
  3. What is GroupBy?
  4. How to calculate mean for each group?
  5. What is aggregation?
  6. How to apply multiple aggregations?
  7. What does sort_values() do?
  8. How to sample random rows?
  9. How to find correlation between columns?
  10. Why correlation is important in ML?

E. Advanced & ML-Oriented (41–50)

  1. Why categorical encoding is required for ML?
  2. Difference between map() and replace()?
  3. What is one-hot encoding?
  4. What does get_dummies() do?
  5. Difference between merge() and concat()?
  6. What is vectorization in Pandas?
  7. Why iterrows() is slow?
  8. How to prepare Pandas data for ML models?
  9. What is select_dtypes()?
  10. What are common Pandas mistakes beginners make?

21. Is Pandas Alone Enough for Data Science?

Short Answer

No. Pandas is necessary but not sufficient for Data Science.

Why Pandas is Critical (Must-Have)

  • Data cleaning
  • Data analysis
  • Feature preparation
  • Real-world dataset handling

What Pandas Cannot Do Alone

  • Machine Learning models
  • Statistics & probability reasoning
  • Model evaluation
  • Deep learning
  • Deployment

Complete Data Science Stack

  1. Python basics
  2. Pandas (this document)
  3. NumPy
  4. Statistics & Probability
  5. Data Visualization (Matplotlib, Seaborn)
  6. SQL
  7. Machine Learning (scikit-learn)
  8. Projects with real datasets

Reality Check

  • 70–80% of a Data Scientist’s daily work = Pandas
  • But job readiness requires full stack knowledge

22. Final Summary

If you master everything in this document + solve all 50 questions, then:

  • You are strong in Pandas
  • You are ready for ML preprocessing
  • You can handle real datasets confidently

Next required step after Pandas:
Statistics → NumPy → Visualization → Machine Learning

This document now represents a complete Pandas syllabus for Data Science.

Post navigation

Previous: SMART Friday Insight Series – Episode 02: Orientation on Social Security Fund

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Copyright © 2026 Subin Thapa
No Form Selected This form is powered by: Sticky Floating Forms Lite