πŸ“¦ Asabeneh / data-analysis-with-python-spring-2025

β˜… 21 stars β‘‚ 5 forks πŸ‘ 21 watching
πŸ“₯ Clone https://github.com/Asabeneh/data-analysis-with-python-spring-2025.git
HTTPS git clone https://github.com/Asabeneh/data-analysis-with-python-spring-2025.git
SSH git clone git@github.com:Asabeneh/data-analysis-with-python-spring-2025.git
CLI gh repo clone Asabeneh/data-analysis-with-python-spring-2025
Asabeneh Asabeneh cats.json moved to data 5f091ea 8 months ago πŸ“ History
πŸ“‚ master View all commits β†’
πŸ“ data
πŸ“ week-1
πŸ“ week-10
πŸ“ week-2
πŸ“ week-3
πŸ“ week-4
πŸ“ week-5
πŸ“ week-6
πŸ“ week-7
πŸ“ week-8
πŸ“ week-9
πŸ“„ .gitignore
πŸ“„ Note.md
πŸ“„ query.sql
πŸ“„ readme.md
πŸ“„ README.md

Week 1: Introduction to Data Analysis & Practicalities

  • Objective: Set up the environment and understand the data lifecycle.
  • Tools: Python, Visual Stuido Code, Jupyter Notebook on Anaconda, Google Colab.
  • Topics:
  • What is data analysis? (Descriptive vs. diagnostic vs. predictive).
  • Installing Python libraries (pip, conda).
  • Data types (structured, semi struncture, unstructured data).
  • Introduction to datasets (CSV, Excel, SQL).
  • Common File fomats of datasets(.txt, .csv, .tsv, .json. .xml, .xlsx)
  • Ethics in data handling (GDPR, privacy).
  • Hands-on: Loading a dataset and basic exploration.

Week 2: NumPy Fundamentals

  • Objective: Master array operations for numerical computing.
  • Tools: NumPy.
  • Topics:
  • Creating arrays (1D, 2D, 3D).
  • Array operations (reshaping, slicing, broadcasting).
  • Mathematical functions (aggregations, linear algebra).
  • Random sampling (normal, uniform distributions).
  • Hands-on: Solving numerical problems (e.g., Random data generation, matrix multiplication, ).

Week 3: Data Visualization with Matplotlib

  • Objective: Create static, interactive, and publication-quality plots.
  • Tools: Matplotlib, Seaborn (optional).
  • Topics:
  • Line plots, bar charts, histograms, scatter plots.
  • Customizing plots (labels, legends, colors).
  • Subplots and multi-panel figures.
  • Introduction to Seaborn for statistical visuals.
  • Hands-on: Visualizing different datasets

Week 4: Pandas for Data Manipulation

  • Objective: Clean, transform, and analyze tabular data.
  • Tools: Pandas.
  • Topics:
  • DataFrames vs. Series.
  • Indexing (loc, iloc), filtering, and grouping.
  • Handling missing data (dropna, fillna).
  • Merging/joining datasets.
  • Hands-on: Cleaning, Exploring, transforming, visualizing and analysing different datasets

Week 5: Descriptive Statistics

  • Objective: Summarize and interpret data distributions.
  • Tools: Pandas, SciPy.
  • Topics:
  • Measures of central tendency (mean, median, mode).
  • Measures of spread (variance, standard deviation, IQR).
  • Skewness, kurtosis, and distributions (normal, binomial).
  • Correlation and covariance.
  • Hands-on: Exporing and analyzing a dataset.

Exercise

Task 1: Employee Dataset Analysis

Objective: Use Python and pandas to analyze the Employee Dataset and derive actionable insights. You can download the Employee Dataset data from Kaggle, you need to create an account on Kaggle since it requires to downolad datasets

Requirements:

  • Data Preparation:
  • Clean the dataset (handle missing values, duplicates, data types).
  • Validate columns like salary, age, and management for consistency.
  • Exploratory Analysis:
  • Generate summary statistics (mean, median, distributions).
  • Explore relationships between variables (e.g., salary vs. education, management vs. environment satisfaction).
  • Visualization:
  • Create visualizations (e.g., boxplots for salary distribution by education level, heatmaps for correlation analysis).
  • Highlight trends (e.g., attrition patterns linked to management scores).
  • Key Questions:
  • Does higher education correlate with salary or job retention?
  • Are there gender disparities in salary or promotion?
  • What workplace factors (e.g., environment, colleagues) most impact employee satisfaction?

Task 2: Cat Breed API to CSV Transformation

Objective: Fetch data from The Cat API and transform it into a structured cats.csv file.

Requirements:

  • API Data Extraction:
  • Fetch breed data programmatically
  • Data Transformation:
  • Map API fields to CSV headers:
ID, Name, Origin, Description, Temperament, Life Span (years), Weight (kg), Image URL

  • Special Cases:
  • Combine temperament as a comma-separated string (e.g., "Active, Curious").
  • Convert weight from imperial to metric if necessary.
  • Extract the first image URL from the breed’s image object.
  • Validation:
  • Handle missing fields (e.g., default Description to "N/A" if empty).
  • Ensure numeric columns (Life Span, Weight) are properly formatted.
Sample CSV Row:

ID, Name, Origin, Description, Temperament, Life Span (years), Weight (kg), Image URL
    abys,Abyssinian,Egypt,"The Abyssinian is easy to care for...","Active, Energetic, Independent",14.5,4,https://cdn2.thecatapi.com/images/0XYvRd7oD.jpg