Skip to content

james-har3/bluesky-profile-posts-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Bluesky Profile Posts Scraper

The Bluesky Profile Posts Scraper makes it easy to extract complete post data from any Bluesky profile, including text, media, and engagement metrics. It solves the challenge of manually collecting public Bluesky content for analytics, research, and automation workflows. This tool delivers clean, structured JSON that’s ready for dashboards, machine learning, or social trend analysis.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Bluesky Profile Posts Scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project retrieves posts from Bluesky user profiles and outputs detailed, structured data. It helps analysts, creators, and developers gather insights from Bluesky activity without manual effort.

Why Use a Bluesky Scraper?

  • Automatically collects posts, engagement metrics, and author details.
  • Supports large-scale data gathering with reliable, consistent output.
  • Ideal for social listening, content research, reporting, or archiving.
  • Captures images, videos, and embedded media alongside text.
  • Provides standardized JSON for easy integration into analytic workflows.

Features

Feature Description
Comprehensive Post Capture Extracts text, images, videos, engagement counts, timestamps, and URIs.
Fast Data Extraction Efficiently retrieves large volumes of posts from multiple profiles.
Media Retrieval Captures media such as images, thumbnails, and embedded video playlists.
Clean Structured Output Delivers unified JSON for direct use in analysis or automation.
Author Metadata Retrieves profile information including handle, display name, avatar, and DID.

What Data This Scraper Extracts

Field Name Field Description
likeCount Number of likes received by the post.
replyCount Total replies associated with the post.
repostCount Number of reposts.
quoteCount Number of quoted posts.
indexedAt Timestamp when the post was indexed.
uri Unique Bluesky URI for the post.
author Object containing author DID, handle, display name, avatar, and metadata.
text Full text content of the post.
embed Object containing playlist and thumbnail media URLs.

Example Output

[
  {
    "likeCount": 215,
    "quoteCount": 1,
    "replyCount": 19,
    "repostCount": 12,
    "indexedAt": "2025-02-15T07:18:17.051Z",
    "uri": "at://did:plc:cy4af3hlkdaht7wltvdmc35k/app.bsky.feed.post/3li76g6lq722r",
    "author": {
      "did": "did:plc:cy4af3hlkdaht7wltvdmc35k",
      "handle": "t3.gg",
      "displayName": "Theo",
      "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:cy4af3hlkdaht7wltvdmc35k/bafkreiczc675kmmavcc4pyzhaixqkkucdahhxqw25xrlp2rt4cajdmojfm@jpeg",
      "associated": {
        "chat": {
          "allowIncoming": "following"
        }
      },
      "labels": [],
      "createdAt": "2023-04-12T03:02:52.540Z"
    },
    "text": "I made a new search engine. Kind of.\n\nIntroducing unduck.link, my DuckDuckGo replacement :)",
    "embed": {
      "playlist": "https://video.bsky.app/watch/did%3Aplc%3Acy4af3hlkdaht7wltvdmc35k/bafkreido5hly55f4t5fbjqtmkhcycc44lgy5bozvntvuzkqmltoedleygm/playlist.m3u8",
      "thumbnail": "https://video.bsky.app/watch/did%3Aplc%3Acy4af3hlkdaht7wltvdmc35k/bafkreido5hly55f4t5fbjqtmkhcycc44lgy5bozvntvuzkqmltoedleygm/thumbnail.jpg"
    }
  }
]

Directory Structure Tree

Bluesky Profile Posts Scraper/
├── src/
│   ├── runner.js
│   ├── extractors/
│   │   ├── bluesky_parser.js
│   │   └── utils_media.js
│   ├── outputs/
│   │   └── exporters.js
│   └── config/
│       └── settings.example.json
├── data/
│   ├── inputs.sample.txt
│   └── sample.json
├── package.json
├── requirements.txt
└── README.md

Use Cases

  • Researchers extract Bluesky discussions to analyze social trends and public sentiment.
  • Marketing teams collect engagement metrics to track influencer activity and campaign performance.
  • Developers integrate post data into dashboards or automation tools to streamline reporting.
  • Content creators monitor competitors’ posts to discover new ideas and benchmark engagement.
  • Data analysts build datasets for training models or conducting behavioral analysis.

FAQs

Q: Does the scraper retrieve media like images and videos? Yes — it captures embedded media such as playlist URLs, thumbnails, and image attachments.

Q: How many posts can it extract per profile? It supports large-scale extraction and can process extensive post histories within rate limits.

Q: What output format does it use? All results are exported in structured JSON for easy integration.

Q: Does it require API access? No API keys are required; it works directly with publicly available profile data.


Performance Benchmarks and Results

Primary Metric: Capable of processing hundreds of posts per minute with efficient batching. Reliability Metric: Maintains a high success rate across diverse user profiles with stable output formatting. Efficiency Metric: Optimized to minimize redundant network calls and reduce resource use during large crawls. Quality Metric: Provides high data completeness with consistent extraction of metadata, media, and engagement fields.

Book a Call Watch on YouTube

Review 1

“Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time.”

Nathan Pennington
Marketer
★★★★★

Review 2

“Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on.”

Eliza
SEO Affiliate Expert
★★★★★

Review 3

“Exceptional results, clear communication, and flawless delivery. Bitbash nailed it.”

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors