Koran Teknologi

A Python-based tech blog aggregator that scrapes content from various engineering blogs and delivers updates via Telegram.

Features

Scrapes tech blogs from:
- Netflix Tech Blog
- Uber Engineering
- Airbnb Engineering
- ByteByteGo
- AWS Architecture
- Lyft Engineering
Sends updates via Telegram channel
Customizable time range for fetching posts
Supports dry-run mode for testing

Setup

Quick Setup

make setup

This will:

Check if Poetry is installed
Install project dependencies
Create a .env file from template if it doesn't exist
Create necessary directories

Usage

The application can run in two modes: CLI and HTTP server.

CLI Mode

# Show all available commands
make help

# Run blog checker (last 24 hours)
make run

# Check posts from last 3 days
make run DAYS=3

# Test mode - just print posts without sending
make run DRY_RUN=1

# Check last 7 days in test mode
make run DAYS=7 DRY_RUN=1

HTTP Server Mode

# Start HTTP server (default: http://0.0.0.0:8000)
make run-http

# Start server on custom host and port
make run-http HTTP_HOST=127.0.0.1 HTTP_PORT=3000

Available endpoints:

POST /send-posts - Send new tech blog posts to Telegram
GET /health - Health check endpoint

Development

Available make commands:

help                  Show this help message
install               Install project dependencies
lint                  Run code quality checks
format                Format code with black and isort
setup                 Initial project setup
run                   Run the blog checker (use DAYS=n for custom days, DRY_RUN=1 for dry run)
run-http              Run the HTTP server (use HTTP_HOST and HTTP_PORT for custom host/port)
clean                 Remove temporary files and build artifacts

The project uses:

Poetry for dependency management
Black for code formatting
Flake8 for linting
isort for import sorting

CI/CD

The project includes GitHub Actions workflows for:

Running code quality checks on pull requests
Configurable scheduled blog checks (at midnight UTC)

Adding New Sources

Manual Method

Create a new file in scraper/ directory
Implement the BaseScraper class
Add the scraper to the list in main.py

Using AI Commands

You can use GitHub Copilot to help implement a new web source scraper:

Open the project in VS Code with GitHub Copilot enabled
Create a new Python file in the scraper/ directory

Provide the following information in your comment:

"""
Implement a scraper for [Blog Name] that extends BaseScraper

Source URL: [Full URL of the blog, e.g. https://blog.example.com]

Sample HTML structure for a blog post:
<div class="post">
  <h2 class="title"><a href="/post-url">Post Title</a></h2>
  <div class="meta">
    <time datetime="2025-05-10T10:00:00Z">May 10, 2025</time>
  </div>
</div>
"""

GitHub Copilot will suggest the implementation, including:
- Required imports
- Class structure extending BaseScraper
- HTML parsing logic based on the provided structure
- Date parsing and timezone handling
- Error handling and logging

For example, here's a full comment for implementing an AWS Architecture blog scraper:

"""
Implement a scraper for AWS Architecture blog that extends BaseScraper

Source URL: https://aws.amazon.com/blogs/architecture/

Sample HTML structure:
<div class="lb-row lb-snap">
  <div class="lb-col lb-mid-6 lb-tiny-24">
    <a href="[post-url]"><img src="[image-url]"></a>
  </div>
  <div class="lb-col lb-mid-18 lb-tiny-24">
    <h2 class="lb-bold blog-post-title">
      <a href="[post-url]">
        <span property="name headline">Post Title</span>
      </a>
    </h2>
    <footer class="blog-post-meta">
      <span>by <span property="author"><span property="name">Author Name</span></span></span>
      <span>on <time property="datePublished" datetime="2025-04-21T10:08:07-07:00">21 APR 2025</time></span>
    </footer>
  </div>
</div>
"""

The scraping implementation will be tailored to your specific blog source, considering:

HTML element structure and class names
Date formats and timezones
Pagination if present
Error handling for missing elements

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude		.claude
.github		.github
channels		channels
cmd		cmd
scrapers		scrapers
services		services
utils		utils
.env.template		.env.template
.flake8		.flake8
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Koran Teknologi

Features

Setup

Quick Setup

Usage

CLI Mode

HTTP Server Mode

Development

CI/CD

Adding New Sources

Manual Method

Using AI Commands

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Koran Teknologi

Features

Setup

Quick Setup

Usage

CLI Mode

HTTP Server Mode

Development

CI/CD

Adding New Sources

Manual Method

Using AI Commands

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages