Skip to content

This project demonstrates your ability to design and implement scalable data pipelines, optimize for performance and reliability, and leverage a variety of tools and technologies to achieve your goals. By quantifying the results, you can showcase the impact of your work and the scale at which you operated.

Notifications You must be signed in to change notification settings

MohitKumarMandhre/twitter-airflow-project

Repository files navigation

twitter-airflow-project

  • Extracting data from Twitter

Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details, such as: Data encoding and decoding.This Twitter API gives developers access to almost all of Twitter's functionalities like likes, retweets, tweets, etc.

  • Use Python to extract data from API

  • Deploying code on EC2

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) Cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster.It provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.

  • Use Airflow for workflow management

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow's extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows.It allows you to take data from different sources, transform it into meaningful information, and load it to destinations like data lakes or data warehouses.

  • Store data into S3 bucket

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere.

  • Image gallery

Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize the flow of values between basic blocks, and to provide optimization techniques in the basic block.

Made with 💖 & 🔥 by MKM.

About

This project demonstrates your ability to design and implement scalable data pipelines, optimize for performance and reliability, and leverage a variety of tools and technologies to achieve your goals. By quantifying the results, you can showcase the impact of your work and the scale at which you operated.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages