dist/bundle.min.js
For example,
---
title: Awesome Markdown Extractor
id: 101
---
# Abstract
Lorem ipsum dolor...
would be parsed as
{
metadata: {
title: "Awesome Markdown Extractor",
id: 101
},
content: {
abstract: 'Abstract'
},
html: "<h1>Abstract</h1>...."
}
NodeJs v12 or above. May work on lower version but not tested.
$ npm install @sohailalam2/markdown-extractor
Check example
for more information.
Given the sample markdown content, lets see how we can parse it and extract the metadata information
---
title: Backend Engineer
id: 101
locations: [India, Remote]
department: Engineering
publishDate: 2020-06-27T13:53:26.714Z
tags: [NodeJs, AWS, Serverless, TypeScript]
isDraft: true
---
# Backend Engineer
## Abstract
This is the awesome _abstract_ for the **backend engineering** role. Visit https://github.com to checkout our brand and some amazing content.
## Preferred Qualifications
- AWS experience
- Serverless experience
## Perks
- Industry standard salary
- Awesome team
- Freedom and responsibilities
## Other Details
This is an **amazing** opportunity for _budding engineers_. Apply now!!
const fs = require('fs');
const path = require('path');
const { parseMarkdown } = require('@sohailalam2/markdown-extractor');
const markdown = fs.readFileSync(path.join(__dirname, 'job-backend-engineer.md'), 'utf8');
const options = {
selectors: [
{ selector: '#abstract', parseHtml: true },
{ selector: '#preferred-qualifications' },
{ selector: '#perks', parseHtml: true },
],
};
const { metadata, content, html } = parseMarkdown(markdown, options);
// metadata:
//
// {
// title: 'Backend Engineer',
// id: 101,
// locations: [ 'India', 'Remote' ],
// department: 'Engineering',
// publishDate: 2020-06-27T13:53:26.714Z,
// tags: [ 'NodeJs', 'AWS', 'Serverless', 'TypeScript' ],
// isDraft: true
// }
const abstract = content['#abstract'];
// <p>This is the awesome <em>abstract</em> for the <strong>backend engineering</strong> role. Visit <a href="https://github.com">https://github.com</a> to checkout our brand and some amazing content.</p>
const preferredQualifications = content['#preferred-qualifications'].split('\n');
// [ 'AWS experience', 'Serverless experience' ]
const perks = content['#perks'];
// <ul>
// <li>Industry standard salary</li>
// <li>Awesome team</li>
// <li>Freedom and responsibilities</li>
// </ul>
<script src="../dist/bundle.min.js"></script>
<script>
// ...
const { metadata, content, html } = MarkdownExtractor.parseMarkdown(markdown, options);
// ...
</script>
The parseMarkdown
function takes one required parameter (markdown as string) and an optional parameter for configuring the parser:
function parseMarkdown(data: string, options?: MarkdownExtractorOptions): MarkdownExtractorResult {}
The various options and its effects are described as below:
The delimiter boundary that holds the metadata content. It defaults to ---
.
Example:
---
title: Awesome Markdown Extractor
id: 101
---
This is an array of MarkdownDomSelector
containing two properties. If provided, the parser will parse the markdown to HTML and also selectively extract data out of the DOM elements selected by the provided selectors.
selector
(string) is a jQuery style DOM selectorparseHtml
(boolean, optional) indicating whether to extract the content of the selected DOM element as HTML or as Text. Defaults to text.Example:
const selectors = [
{ selector: '#abstract', parseHtml: true },
{ selector: '#preferred-qualifications' },
{ selector: '#perks', parseHtml: true },
];
Internally we use cheerio to parse the HTML content and extract data using DOM selectors. You can optionally configure its behavior using this parameter. Read more here