Markdown Extractor 🔣

Your one stop for parsing Markdown content. Give your markdown superpower by adding metadata.

📢 Features

Parse markdown content. We use marked
Extract YAML metadata information about your markdown content. We use js-yaml
Convert markdown to HTML and easily extract data from DOM nodes by passing selectors. We use cheerio
Support for both NodeJS and Browser
Browser standalone package also available at dist/bundle.min.js

💡 How does it work?

For example,

---
title: Awesome Markdown Extractor
id: 101
---

# Abstract

Lorem ipsum dolor...

would be parsed as

{
  metadata: {
    title: "Awesome Markdown Extractor",
    id: 101
  },
  content: {
    abstract: 'Abstract'
  },
  html: "<h1>Abstract</h1>...."
}

📝 Prerequisites

NodeJs v12 or above. May work on lower version but not tested.

💻 Installation

$ npm install @sohailalam2/markdown-extractor

✅ Usage

Check example for more information.

Given the sample markdown content, lets see how we can parse it and extract the metadata information

---
title: Backend Engineer
id: 101
locations: [India, Remote]
department: Engineering
publishDate: 2020-06-27T13:53:26.714Z
tags: [NodeJs, AWS, Serverless, TypeScript]
isDraft: true
---

# Backend Engineer

## Abstract

This is the awesome _abstract_ for the **backend engineering** role. Visit https://github.com to checkout our brand and some amazing content.

## Preferred Qualifications

- AWS experience
- Serverless experience

## Perks

- Industry standard salary
- Awesome team
- Freedom and responsibilities

## Other Details

This is an **amazing** opportunity for _budding engineers_. Apply now!!

const fs = require('fs');
const path = require('path');

const { parseMarkdown } = require('@sohailalam2/markdown-extractor');

const markdown = fs.readFileSync(path.join(__dirname, 'job-backend-engineer.md'), 'utf8');

const options = {
  selectors: [
    { selector: '#abstract', parseHtml: true },
    { selector: '#preferred-qualifications' },
    { selector: '#perks', parseHtml: true },
  ],
};

const { metadata, content, html } = parseMarkdown(markdown, options);

// metadata:
//
// {
//   title: 'Backend Engineer',
//   id: 101,
//   locations: [ 'India', 'Remote' ],
//   department: 'Engineering',
//   publishDate: 2020-06-27T13:53:26.714Z,
//   tags: [ 'NodeJs', 'AWS', 'Serverless', 'TypeScript' ],
//   isDraft: true
// }

const abstract = content['#abstract'];
// <p>This is the awesome <em>abstract</em> for the <strong>backend engineering</strong> role. Visit <a href="https://github.com">https://github.com</a> to checkout our brand and some amazing content.</p>

const preferredQualifications = content['#preferred-qualifications'].split('\n');
// [ 'AWS experience', 'Serverless experience' ]

const perks = content['#perks'];

// <ul>
// <li>Industry standard salary</li>
// <li>Awesome team</li>
// <li>Freedom and responsibilities</li>
// </ul>

Standalone library in browser

<script src="../dist/bundle.min.js"></script>

<script>
  // ...

  const { metadata, content, html } = MarkdownExtractor.parseMarkdown(markdown, options);

  // ...
</script>

🔨 Configuration Options

The parseMarkdown function takes one required parameter (markdown as string) and an optional parameter for configuring the parser:

function parseMarkdown(data: string, options?: MarkdownExtractorOptions): MarkdownExtractorResult {}

The various options and its effects are described as below:

options.metadataDelimiter

The delimiter boundary that holds the metadata content. It defaults to ---.

Example:

---
title: Awesome Markdown Extractor
id: 101
---

options.selectors

This is an array of MarkdownDomSelector containing two properties. If provided, the parser will parse the markdown to HTML and also selectively extract data out of the DOM elements selected by the provided selectors.

selector (string) is a jQuery style DOM selector
parseHtml (boolean, optional) indicating whether to extract the content of the selected DOM element as HTML or as Text. Defaults to text.