![]() | Name | Last modified | Size | Description |
---|---|---|---|---|
![]() | Parent Directory | - | ||
![]() | encoding/ | 2 years ago | - | |
![]() | LICENSE | 7 years ago | 1.0K | |
![]() | README.md | 7 years ago | 1.8K | d768d73 docs [كارل مبارك] |
![]() | index.js | 7 years ago | 3.3K | 3e510ca test new git [كارل مبارك] |
![]() | match.js | 7 years ago | 155 | |
![]() | package.json | 2 years ago | 1.8K | 3e510ca test new git [كارل مبارك] |
Chardet is a character detection module for NodeJS written in pure Javascript. Module is based on ICU project http://site.icu-project.org/, which uses character occurency analysis to determine the most probable encoding.
npm i chardet
To return the encoding with the highest confidence:
var chardet = require('chardet');
chardet.detect(Buffer.alloc('hello there!'));
// or
chardet.detectFile('/path/to/file', function(err, encoding) {});
// or
chardet.detectFileSync('/path/to/file');
To return the full list of possible encodings:
var chardet = require('chardet');
chardet.detectAll(Buffer.alloc('hello there!'));
// or
chardet.detectFileAll('/path/to/file', function(err, encoding) {});
// or
chardet.detectFileAllSync('/path/to/file');
//Returned value is an array of objects sorted by confidence value in decending order
//e.g. [{ confidence: 90, name: 'UTF-8'}, {confidence: 20, name: 'windows-1252', lang: 'fr'}]
Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:
chardet.detectFile('/path/to/file', { sampleSize: 32 }, function(err, encoding) {});
Currently only these encodings are supported, more will be added soon.