The parsing of MOBI and KF8 files is based on this MOBI parsing implementation.
Introduction
The MOBI file format is based on the PDB file format, while the KF8 (Kindle Format 8) format is built on top of MOBI. Therefore, the parsing of both MOBI and KF8 is handled together in the mobi-parser
.
Install
pnpm install @lingo-reader/mobi-parser
MobiFile
Usage in browser
import { initMobiFile } from '@lingo-reader/mobi-parser'
import type { MobiSpine } from '@lingo-reader/mobi-parser'
function initMobi(file: File) {
const mobi: Mobi = await initMobiFile(file)
// spine
const spine: MobiSpine = mobi.getSpine()
// loadChapter
const firstChapter = mobi.loadChapter(spine[0].id)
}
// see Mobi class
// .....
Usage in node
import { initMobiFile } from '@lingo-reader/mobi-parser'
import type { MobiSpine } from '@lingo-reader/mobi-parser'
const mobi: Mobi = await initMobiFile('./example/taoyong.mobi')
// spine
const spine: MobiSpine = mobi.getSpine()
// loadChapter
const firstChapter = mobi.loadChapter(spine[0].id)
// see Mobi class
// .....
initMobiFile(file: string | File |Uint8Array, resourceSaveDir?: string): Promise<Mobi>
API for initializing a MOBI file. You can pass in a file path, a File
object, or a Uint8Array
to receive a Mobi
instance, which provides APIs for reading metadata, spine information, and more.
Parameters:
file: string | File | Uint8Array
– The MOBI file input, which can be a file path (Node.js), aFile
object (browser), or aUint8Array
in either environment.resourceSaveDir?: string
– (Optional) Applicable mainly in Node.js. Specifies the directory to save images and other resources. Defaults to./images
.
Returns:
Promise
– A Promise that resolves to the initializedMobi
object.
Note:
For the file
parameter:
- In the browser, it must be a
File
orUint8Array
. Passing astring
will throw an error. - In Node.js, it must be a
string
orUint8Array
. Passing aFile
will throw an error.
Mobi class
class Mobi {
getFileInfo(): MobiFileInfo
getSpine(): MobiSpine
loadChapter(id: string): MobiProcessedChapter | undefined
getToc(): MobiToc
getCoverImage(): string
getMetadata(): MobiMetadata
resolveHref(href: string): MobiResolvedHref | undefined
destroy(): void
}
getFileInfo(): MobiFileInfo
interface MobiFileInfo {
// MOBI file name, including the file extension
fileName: string
}
This returns file information, currently with only the fileName
field.
getSpine(): MobiSpine
interface MobiChapter {
// Chapter ID
id: string
// Raw HTML text with unmodified resource paths
text: string
// The byte position where the chapter starts in the file
start: number
// The byte position where the chapter ends in the file
end: number
// The byte size of the chapter
size: number
}
type MobiSpine = MobiChapter[]
The spine lists all the chapters to be displayed in sequence.
In the original HTML text of the chapters, image resource paths point to internal file addresses. These need to be replaced with blob URLs in order to display them directly in the browser.
loadChapter(id: string): MobiProcessedChapter | undefined
interface MobiCssPart {
id: string
href: string
}
interface MobiProcessedChapter {
html: string
css: MobiCssPart[]
}
The loadChapter
function takes a chapter ID as its argument and returns the processed chapter object. If the return value is undefined
, it means the chapter does not exist.
The raw HTML is split into CSS and text (MobiProcessedChapter
), where the CSS is provided as a blob URL (MobiCssPart
), and the HTML is the modified text after resource replacement, mainly in the <body>
tag content.
getToc(): MobiToc
interface MobiTocItem {
// TOC item label
label: string
// Internal MOBI href
href: string
// Sub TOC item
children?: MobiTocItem[]
}
type MobiToc = MobiTocItem[]
This function retrieves the table of contents (TOC) of the book.
getCoverImage(): string
This retrieves the cover image of the book as a blob URL. When an empty string is returned, it means that the CoverImage
does not exist.
getMetadata(): MobiMetadata
interface MobiMetadata {
// Unique identifier
identifier: string
// Book title
title: string
// Author(s)
author: string[]
// Publisher
publisher: string
// Language of the book
language: string
// Publication date
published: string
// Description of the book
description: string
// Categories or subjects
subject: string[]
// Copyright information
rights: string
// Contributors
contributor: string[]
}
This retrieves the metadata for the book.
resolveHref(href: string): MobiResolvedHref | undefined
interface MobiResolvedHref {
// Chapter ID
id: string
// HTML selector
selector: string
}
This function parses links that point to other chapters and returns the result in the format shown above.
destroy(): void
This function clears blob URLs and resources created during the file parsing process to prevent memory leaks.
Kf8File
Usage in browser
import { initKf8File } from '@lingo-reader/mobi-parser'
import type { Kf8Spine } from '@lingo-reader/mobi-parser'
function initMobi(file: File) {
const mobi: Kf8 = await initKf8File(file)
// spine
const spine: Kf8Spine = mobi.getSpine()
// loadChapter
const firstChapter: Kf8ProcessedChapter = mobi.loadChapter(spine[0].id)
}
// see Kf8 class
// .....
Usage in node
import { initKf8File } from '@lingo-reader/mobi-parser'
import type { Kf8Spine } from '@lingo-reader/mobi-parser'
const mobi: Kf8 = await initKf8File('./example/taoyong.azw3')
// spine
const spine: Kf8Spine = mobi.getSpine()
// loadChapter
const firstChapter: Kf8ProcessedChapter = mobi.loadChapter(spine[0].id)
// see Kf8 class
// .....
initKf8File(file: InputFile, resourceSaveDir?: string): Promise<Kf8>
API for initializing an AZW3 (KF8) file. You can pass in a file path, a File
object, or a Uint8Array
to receive a fully initialized Kf8
object, which provides APIs for reading metadata, spine information, and more.
Parameters:
file: string | File | Uint8Array
– The input KF8 file. Can be a file path (Node.js), aFile
object (browser), or aUint8Array
in either environment.resourceSaveDir?: string
– (Optional) Mainly used in Node.js. Specifies the directory to save images and other resources. Defaults to./images
.
Returns:
Promise
– A Promise that resolves to the initializedKf8
object.
Note:
For the file
parameter:
- In the browser, the type must be
File
orUint8Array
. Passing astring
will throw an error. - In Node.js, the type must be
string
orUint8Array
. Passing aFile
will throw an error.
Kf8 class
class Kf8 {
getFileInfo(): Kf8FileInfo
getMetadata(): Kf8Metadata
getCoverImage(): string
getSpine(): Kf8Spine
getToc(): Kf8Toc
getGuide(): Kf8Guide | undefined
loadChapter(id: string): Kf8ProcessedChapter | undefined
resolveHref(href: string): Kf8ResolvedHref | undefined
destroy(): void
}
getFileInfo(): Kf8FileInfo
type Kf8FileInfo = MobiFileInfo
This is the same as MobiFileInfo
, see here
getMetadata(): Kf8Metadata
type Kf8Metadata = MobiMetadata
This is the same as MobiMetadata
, see here
getCoverImage(): string
This retrieves the book's cover image as a blob URL. When an empty string is returned, it means that the CoverImage
does not exist.
getSpine(): Kf8Spine
interface Kf8Chapter {
id: string
skel: SkelTableItem
frags: FragTable
fragEnd: number
length: number
totalLength: number
}
type Kf8Spine = Kf8Chapter[]
No explanation available yet.
getToc(): Kf8Toc
interface Kf8TocItem {
label: string
// Path to the chapter resource
href: string
children?: Kf8TocItem[]
}
type Kf8Toc = Kf8TocItem[]
This retrieves the table of contents (TOC).
getGuide(): Kf8Guide | undefined
interface Kf8GuideItem {
label: string
type: string[]
href: string
}
type Kf8Guide = Kf8GuideItem[]
This retrieves parts of the book that are available for browsing.
loadChapter(id: string): Kf8ProcessedChapter | undefined
interface Kf8CssPart {
id: string
href: string
}
interface Kf8ProcessedChapter {
html: string
css: Kf8CssPart[]
}
This works similarly to MobiProcessedChapter
, see here
resolveHref(href: string): Kf8ResolvedHref | undefined
type Kf8ResolvedHref = MobiResolvedHref
This works similarly to MobiResolvedHref
, see here。
destroy(): void
Clears the blob URLs and resources created during the parsing process to prevent memory leaks.