NAME
JSON::SL - Fast, Streaming, and Searchable JSON decoder.
SYNOPSIS
use
JSON::SL;
use
Data::Dumper;
my
$txt
=
<<'EOT';
{
"some" : {
"partial" : 42.42
},
"other" : {
"partial" : "a string"
},
"complex" : {
"partial": {
"a key" : "a value"
}
},
"more" : {
"more" : "stuff"
EOT
my
$json
= JSON::SL->new();
my
$jpath
=
"/^/partial"
;
$json
->set_jsonpointer( [
$jpath
] );
my
@results
=
$json
->feed(
$txt
);
foreach
my
$result
(
@results
) {
printf
(
"== Got result (path %s) ==\n"
,
$result
->{Path});
printf
(
"Query was %s\n"
,
$result
->{JSONPointer});
my
$value
=
$result
->{Value};
if
(!
ref
$value
) {
printf
(
"Got scalar value %s\n"
,
$value
);
}
else
{
printf
(
"Got reference:\n"
);
Dumper(
$value
);
}
"\n"
;
}
Produces:
== Got result (path /some/partial) ==
Query was /^/partial
Got
scalar
value 42.42
== Got result (path /other/partial) ==
Query was /^/partial
Got
scalar
value a string
== Got result (path /complex/partial) ==
Query was /^/partial
Got reference:
$VAR1
= {
'a key'
=>
'a value'
};
DESCRIPTION
JSON::SL was designed from the ground up to be easily accessible and searchable for partially received streamining content.
It uses an embedded C library (jsonsl
) to do the streaming and most of the dirty work.
JSON::SL allows you to use the JSONPointer URI/path syntax to tell it about certain objects and elements which are of interest to you. JSON::SL will then incrementally parse the input stream, returning those selected objects to you as soon as they arrive.
In addition, the objects are returned with extra context information, which is itself another JSONPointer path specifying the path from the root of the JSON stream until the current object.
Since I hate SAX's callback interface, and since almost all the boilerplate for a SAX interface needs to be done for just about every usage case, I have decided to move over the core work of state stacking and such to the C library itself. This means minimal boilerplate and ultra fast performance on your part.
GENERIC METHODS
new()
new($max_levels)
Creates a new JSON::SL
object
If $max_levels
is provided, then it is taken as the maximum recursion depth the parser will be able to descend. This can only be set during construction time as it affects the amount of memory allocated for the internal structures.
The amount of memory allocated for each structure is around 64 bytes on 64-bit (i.e. sizeof (char*) == 8
) systems and around 48 bytes on 32 bit (i.e. sizeof (char*) == 4
) systems.
The default is 512, or a total of 32KB allocated
set_jsonpointer(["/arrayref/of", "/json/paths/^"])
Set the JSONPointer query paths for this object. Note this can only be done once per the object's lifetime, and only before you have started calling the "feed" method.
The JSONPointer notation is quite simple, and follows URI scheme conventions. Each /
represents a level of descent into an object, and each path component represents a hash key or array index (whether something is indeed a key or an index is derived from the context of the JSON stream itself, in case you were wondering).
http://tools.ietf.org/html/draft-pbryan-zyp-json-pointer-02 Contains the draft for the JSONPointer specification.
As an extension to the specification, JSON::SL
allows you to use the ^
(caret) character as a wildcard. Placing the lone ^
in any path component means to match any value in the current level, effectively providing glob-style semantics.
feed($input_text)
incr_parse($input_text)
This is the meat and potatoes of JSON::SL
. Call it with $input
being a JSON input stream, with likely partial data.
The module will do its magic and decode elements for you according to the queries set in "set_jsonpointer".
If called in scalar context, returns one matching item from the partial stream. If called in list context, returns all remaining matching items. If called in void context, the JSON is still decoded, but nothing is returned.
The return value is one or a list of (depending on the context) hash references with the following keys
- Value
-
This is the actual value selected by the query. This can be a string, number, hash reference, array reference, undef, or a
JSON::SL::Boolean
object. - Path
-
This is a JSONPointer path, which can be used to get context information (and perhaps be able to locate 'neighbors' in the object graph using "root").
- JSONPointer
-
The original matching query path used to select this object. Can be used to associate this object with some extra user-defined context.
N.B. incr_parse
is an alias to this method, for familiarity.
fetch()
Returns remaining decoded JSON objects. Returns the same kinds of things that "feed" does (with the same semantics dependent on scalar and list context), except that it does not accept any arguments. This is helpful for a usage pattern as such:
$sl
->feed(
$large_json
);
while
(
my
(
$res
=
$sl
->fetch)) {
# do something with the result object..
}
reset()
Resets the state. Any cached objects, result queues, and such are deleted and freed. Note that the JSONPointer query will still remain (and is static for the duration of the JSON::SL instance).
OBJECT GRAPH INSPECTION AND MANIPULATION
One of JSON::SL
's features is the ability to get a perl-representation of incomplete JSON data. As soon as a JSON element can be converted to some kind of shell which resembles a Perl object, it is inserted into the object graph, or object tree
root()
This returns the partial object graph formed from the JSON stream. In other words, this is the object tree.
Items whihc have been selected to be filtered via "set_jsonpointer" are not present in this object graph, and neither are incomplete strings.
It is an error to modify anything in the object returned by root, and Perl will croak if you try so with an 'attempted modification of read-only value' error. (but see "make_referrent_writeable" for a way to override this)
Nevertheless it is useful to get a glimpse of the 'rest' of the JSON document not returned via the feed method
NOTE This method is deprecated. Use the "root_callback" method instead.
root_callback($cb)
Invoked when the root object is first created. It is passed a reference to the root object. Use this method instead of root
, as the root object will no longer be available via root()
once the parsing of the current tree is completed. Using a callback oriented mechanism proviedes a better guarantee of being able to keep a reference to the root.
referrent_is_writeable($ref)
Returns true if the object pointed to by $ref
has the SvREADONLY
flag off. In other words, if the flag is off then it is safe to modify its contents.
make_referrent_writeable($ref)
make_referrent_readonly($ref)
Convenience methods to make the perl variable referred to by $ref
read-only or writeable.
make_referrent_writeable
will make the object pointed to by $ref
as writeable, and make_referrent_readonly
will make the object pointed to by $ref
as readonly.
You may 'poll' to see when an object has become writeable by doing the following
1) Locate your initial object in the object graph using
my
$v
=
$sl
->root()
2) Check its initial status by using
$sl
->referrent_is_writeable(
$v
)
3) Stash the reference somewhere, and repeat step 2 as necessary.
Using the make_referrent_writeable
you may modify the object graph as needed. Modification of the object graph is not always safe and performing disallowed modifications can make your application crash (which is why incomplete objects are marked as read-only in the first place).
In the event where you need to make modifications to the object graph, following these guidelines will prevent an application crash:
- Strings, Integers, Booleans
-
These are always safe to modify (and will never be read-only) because they are only inserted into the object graph once they have completed.
- Hash Keys
-
Deleting hash keys which point to placeholders (represented as
undef
) will change the hash key for the real value, once that value is completed. - Hashes, Arrays
-
Removing an array element or hash value which is 1) a container (hash or array), and 2) was read-only will crash your application. Perl will destroy the container when it goes out of scope from your function. However,
JSON::SL
will continue to reference it inside its internal structures, so do not do this.Adding a hash value/key to the hash is permitted, but the value may become clobbered when and if an actual key-value pair is detected from the JSON input stream.
Prepending (i.e.
unshift
ing) to an array is permitted. Appending (i.e.push
ing) to an array is only safe if you are sure that none of the elements of the array are potential JSONPointer query matches. JSONPointer matches for array indices will internall pop the current (i.e. last) element of the array and return it from "feed".
OPTION GETTERS AND SETTERS
utf8()
utf8(boolean)
Get or set the current status of the SvUTF8
flag as it is applied to the strings returned by JSON::SL
. If set to true, then input and output will be assumed to be encoded in utf8
noqstr()
noqstr(boolean)
Get/Set whether the JSONPointer
field is populated in the hash returned by "feed". Turning this on (i.e. leaving out the JSONPointer
field) may gain some performance
nopath()
nopath(boolean)
Get/Set whether path information (the Path
field) is populated in the hash returned by "feed". Turning this on (i.e. leaving out path information) may boost performance, but will also leave you in the dark in regards to where/what your object is.
max_size()
max_size(limit)
This functions exactly like JSON::XS's method of the same name. To quote:
Set the maximum
length
a JSON text may have (in bytes) where decoding is
being attempted. The
default
is C<0>, meaning
no
limit. When C<decode>
is called on a string that is longer then this many bytes, it will not
attempt to decode the string but throw an exception.
...
If
no
argument is
given
, the limit check will be deactivated (same as
when
C<0> is specified).
See SECURITY CONSIDERATIONS in L<JSON::XS>,
for
more info on why this is useful.
object_drip(boolean)
As an alternative to using JSONPointer, you can use an 'object drip'. With this setting enabled, all hashes and arrays will be returned via feed
or fetch in reverse order (i.e. the deepest objects are returned first, followed by their encapsulated objects).
This allows you to inspect complete descendent objects as they arrive.
The objects returned by fetch
and feed
will still follow the same semantics, with context/path information stored inside the Path
key. The JSONPointer
field is obviously not passed since it is not being used.
Example:
use
JSON::SL;
use
Test::More;
my
$sl
= JSON::SL->new();
$sl
->object_drip(1);
# create an incomplete JSON object:
my
$json
=
<<'EOJ';
[ [ { "key1":"foo", "key2":"bar", "key3":"baz" }
EOJ
my
@res
=
$sl
->feed(
$json
);
my
$expected
= [
{
Value
=>
"foo"
,
Path
=>
'/0/0/key1'
,
},
{
Value
=>
"bar"
,
Path
=>
'/0/0/key2'
,
},
{
Value
=>
"baz"
,
Path
=>
'/0/0/key3'
},
{
Value
=> {},
Path
=>
'/0/0'
},
];
is_deeply(\
@res
,
$expected
,
"Got expected results for object drip..."
);
Outer encapsulating objects will have their children removed (as they have already been returned in previous results).
Only complete objects (i.e. objects which can no longer contain any more data) will be returned.
UTILITY FUNCTIONS
These functions are not object methods but rather exported functions. You may export them on demand or use their fully-qualified name
decode_json($json)
Decodes a JSON string and returns a Perl object. This really doesn't serve much use, and JSON::XS is faster than this. Nevertheless it eliminates the need to use two modules if all you want to do is decode JSON.
unescape_json_string($string)
Unescapes a JSON string, translating \uXXXX
and other compliant escapes to their actual character/byte representation. Returns the converted string, undef if the input was empty. Dies on invalid input.
my
$str
=
"\\u0041"
;
my
$unescaped
= unescape_json_string(
$str
);
# => "A"
Both "decode_json" and "feed" output already-unescaped strings, so there is no need to call this function on strings returned by those methods.
BUGS & CAVEATS
Threads
This will most likely not work with threads, although one would wonder why you would want to use this module across threads.
Object Trees
When inspecting the object tree, you may see some undef
values, and it is impossible to determine whether those values are JSON null
s, or placeholder values. It would be possible to implement a class e.g. JSON::SL::Placeholder
, but doing so would either be unsafe or incur additional overhead.
JSONPointer
The ^
caret is somewhat obscure as a wildcard character
Currently wildcard matching is all-or-nothing, meaning that constructs such as foo^
will not work.
Encodings
All input to JSON::SL
should be either UTF-8 or ASCII (a subset of UTF-8).
More specifically, the input stream must be any superset of ASCII which uses octet streams (so this includes Latin1).
Perl itself only natively deals with 8-bit ASCII, Latin1, or UTF8 - so if your input stream is something else (for example, UTF-16) it will need to be converted to UTF8 some point in time before it is passed to JSON::SL
.
Speed
JSON::SL
aims to be the fastest JSON decoded for Perl. Currently it is only in second place - being 25% slower than JSON::XS for decode_json
and about 8% slower for incremental parsing.
Additionally, if your input has lots of escapes (not very common in real-world JSON), JSON::SL
will be even slower.
Nevertheless I believe that the benefits provided by JSON::SL save not only human time, but also machine time - What good is quickly decoding a large JSON stream if there are no proper facilities to inspect it?.
TODO
Work is in progress for a SAX-style interface. See JSON::SL::Tuba
SEE ALSO
JSON::XS - Still faster than this module, and is also the source of many of JSON::SL
's ideas and tests.
If you wish to aid in the development of the JSON parser, do not modify the source files in the perl distribution, they are merely copied over from here:
jsonsl - C core for JSON::SL
JSON - JSON's main page
JSON::SL::Tuba - Same core with an event-oriented interface, like SAX
AUTHOR & COPYRIGHT
Copyright (C) 2012 M. Nunberg
This module contains extracts from JSON::XS, nevertheless they are both licensed under the same terms as Perl itself.