๐Ÿ“ฆ keyurkhant / StreamGuard

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/keyurkhant/StreamGuard.git
HTTPS git clone https://github.com/keyurkhant/StreamGuard.git
SSH git clone git@github.com:keyurkhant/StreamGuard.git
CLI gh repo clone keyurkhant/StreamGuard
Keyur Khant Keyur Khant project added 32849aa 1 years ago ๐Ÿ“ History
๐Ÿ“‚ master View all commits โ†’
๐Ÿ“ templates
๐Ÿ“„ .DS_Store
๐Ÿ“„ .gitignore
๐Ÿ“„ app.py
๐Ÿ“„ kafka_producer.py
๐Ÿ“„ README.md
๐Ÿ“„ requirements.txt
๐Ÿ“„ spark_stream.py
๐Ÿ“„ README.md

Real-Time Sentiment Tracker for Streaming Social Media Data

This project is designed to analyze the sentiment of streaming social media data in real-time using Apache Kafka, Apache Spark on AWS EMR, and a Flask-based dashboard. The processed results are visualized dynamically to showcase sentiment trends.


Features

  • Data Streaming:
  • Kafka streams social media data (simulated using a dataset).
  • Real-Time Processing:
  • Apache Spark processes data using sentiment analysis with TextBlob.
  • Data Storage:
  • Processed data is stored in Amazon S3.
  • Visualization:
  • Flask-based dashboard with Plotly graphs displays sentiment trends.

Technologies Used

  • Apache Kafka: Data streaming.
  • Apache Spark: Real-time processing (via PySpark).
  • AWS EMR: Cloud-based Spark cluster.
  • Flask: Web application for dashboard.
  • Plotly: Data visualization.
  • Amazon S3: Data storage.
  • Python Libraries: textblob, boto3, kafka-python.

Setup Instructions

1. Prerequisites

  • Python 3.8 or above
  • AWS account with access to S3 and EMR
  • Apache Kafka installed locally or managed service
  • Virtual environment setup for Python
  • Dataset: https://www.kaggle.com/datasets/kazanova/sentiment140

2. Install Dependencies

  • Create a virtual environment:
python3 -m venv venv
  • Activate the virtual environment:
  • macOS/Linux:
source venv/bin/activate
  • Windows:
venv\Scripts\activate
  • Install required packages:
pip install -r requirements.txt

3. Set Up Kafka

  • Start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
  • Start Kafka:
bin/kafka-server-start.sh config/server.properties
  • Create a Kafka topic:
bin/kafka-topics.sh --create --topic tweets --bootstrap-server localhost:9092

4. Set Up AWS EMR

  • Go to the AWS Management Console and navigate to the EMR service.
  • Create a cluster with the following configurations:
  • Release version: emr-6.x.x
  • Enable Spark and Hadoop.
  • Choose m5.xlarge instance types.
  • Ensure SSH access is enabled.
  • Launch the cluster.
  • Upload and execute the spark_streaming.py script.

5. Run the Kafka Producer

Stream data to Kafka:
python kafka_producer.py

6. Run the Flask Application

Launch the dashboard:
python app.py

Access the dashboard at http://127.0.0.1:5000.


File Structure

project/
โ”œโ”€โ”€ kafka_producer.py       # Streams data to Kafka
โ”œโ”€โ”€ spark_streaming.py      # Spark job for real-time processing
โ”œโ”€โ”€ app.py                  # Flask application for visualization
โ”œโ”€โ”€ templates/
โ”‚   โ””โ”€โ”€ index.html          # HTML for the dashboard
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ””โ”€โ”€ README.md               # Project documentation


Example Output

  • Dashboard: Displays a bar chart with the counts of positive, negative, and neutral sentiments.

Future Improvements

  • Add live integration with social media APIs (e.g., Twitter API).
  • Use advanced sentiment analysis models for improved accuracy.
  • Implement real-time notifications based on sentiment trends.