Help:Extensions/Scribunto

From Dota 2 Wiki
Jump to: navigation, search
Scribunto
 
 
 
 
Official documentation: Extension:Scribunto
Scribunto
Author(s)  • Victor Vasiliev, Tim Starling and Brad Jorsch
Extension link MediaWiki.org
Description
Framework for embedding scripting languages into MediaWiki pages
Looking for indepth Lua docs? Check out the Lua reference manual on mediawiki.org!

Scribunto is an extension that allows scripting languages to be used within MediaWiki. Without Scribunto, editors would need to try to use wikitext as a scripting language, which results in poor performance and unreadable code.

Scribunto uses Lua, a fast and powerful language that is commonly seen as an embedded language in games, such as Garry's Mod and World of Warcraft.

Before starting with Lua in MediaWiki, it is best to have a basic understanding of Lua itself. The official Programming in Lua book is a good resource for learning Lua if you already have some programming experience (already knowing a language like JavaScript is quite helpful, as there are many similarities), otherwise the tutorials at lua-users.org and Wikipedia's Lua for beginners help page may be useful.

This guide will cover the general usage of Scribunto, rather than using Lua itself. All examples will be saved at Module:Example and Template:Scribunto example (and their sub-pages), and examples after the first one won't repeat boilerplate code unless relevant. Use the reference manual for detailed usage of all functions.

Before getting started

If you want to use Scribunto for the first time on your wiki, you have to request the Scribunto extension to be enabled. To check whether Scribunto is already enabled, you can look for it on Special:Version on your wiki. If it's not listed, follow the instructions on Requesting extensions.

If you will be editing modules in browser, you may want to request Extension:CodeEditor as well. If you plan to be coding a lot, it's probably better to edit in Notepad++ or SublimeText or some other text editor locally and copy-paste to your browser, or use the Mediawiker plugin for SublimeText.

Starting out

Scribunto stores scripts in the Module namespace. These modules can then be used on wiki pages by using the {{#invoke:}} parser function (referred to as "invoking" or "calling" the module). Scripts cannot be directly written in wikitext by design, and it is recommended to call all modules from a template rather than using the {{#invoke:}} parser function on a page, to reduce wikitext clutter.

Scribunto scripts that are to be invoked must return a table. That table must contain some functions that can be referred to in {{#invoke:}} statements. These functions must return a wikitext string which is what is actually output to the page. Refer to this most basic working script:

local p = {}

p.helloWorld = function()
	return 'Hello, world!'
end

return p

Which would be called like this: {{#invoke: Example | helloWorld }} and result in this: Hello, world!

This script creates (and later returns) a table called p and adds a function called helloWorld to the table, which returns the text Hello, world! that is displayed on the page.

Getting arguments

Now, this wouldn't be very useful without being able to send arguments to the script like you can with templates. Scribunto stores the arguments in a "parser frame". This frame is basically a table which contains some useful functions related to the wikitext parser, as well as the table that contains the arguments. It is available as the first parameter of the function in the script, and can also be retrieved with the mw.getCurrentFrame() function.

Direct arguments

Direct arguments or "normal" arguments are those which are set on the {{#invoke:}} parser function. Here's an example that uses arguments:

p.helloName = function( f )
	local args = f.args
	return 'Hello, ' .. args.name .. '!'
end

Which would be called like this: {{#invoke: Example | helloName | name = John Doe }}: Hello, John Doe!
Or in a template like this: {{#invoke: Example | helloName | name = {{{name|}}} }}, {{Scribunto example|name=John Doe}}: Hello, John Doe!

This script assigns the f variable to the frame, then retrieves the args table from the frame and assigns it to the args variable, and finally returns the text input in the name arg with Hello, and ! wrapped around it. Numbered or anonymous args (eg: {{{1}}}) are available too, as args[1].

The arguments are always strings. Like with templates, named and numbered args have whitespace trimmed, whereas anonymous args do not; and arguments that are specified but with no value will be empty strings, rather than nil.

Parent arguments

With templates that use a sub-template, it is common to need to pass along args received from the parent template to the sub-template.

{{Template|{{{1}}}|arg1={{{arg1|}}}|arg2={{{arg2|}}}}}

Code like this can get rather messy, and can have performance issues as all the arguments will be parsed, regardless of whether the template uses them.

Scribunto provides a way to access these "parent args" directly, without needing to manually pass them all through. This is very useful, as you will almost always be calling a script from a template, and it allows you to have an infinite number of possible arguments, something that wasn't possible with traditional templates.

To access the args of the parent template, you need the parent frame. Using the f:getParent() function on the current frame returns the parent template's frame, which can then be used in the same way as the current frame.

The concept of a "parent frame" may be difficult to grasp at first, so here's the previous example using it:

p.parentHello = function( f )
	local args = f:getParent().args
	return 'Hello, ' .. args.name .. '!'
end

Now, we can't just call this directly as we're no longer reading the current frame's args. Instead we will insert the {{#invoke:}} parser function in a template and use it from there. Note the lack of any template arguments being passed along.
{{#invoke: Example | parentHello }}, {{Scribunto example/Parent hello|name=John Doe}}: Hello, John Doe!
It works just like the previous example, despite the name arg not being manually passed to the {{#invoke:}} parser function.

For a single argument like this, it doesn't pose much of an improvement. But think about templates where many arguments are passed, like a navbox. Typically, these templates have a limit of the amount of rows they support, purely because that is as many args as have been manually set up. Using the parent frame directly would allow a navbox with no limit of rows, and would not need to check every row up to its limit to see if any of them have a value. Not only is this faster, but produces much better, and less repetitive code.

Supporting both

Using both direct args and parent args for separate purposes is fairly simple:

p.makeConfigGreeting = function( f )
	local args = f.args
	local parentArgs = f:getParent().args
	return args.greeting .. ', ' .. parentArgs.name .. '!'
end

The direct args are used as a "configuration" for the type of greeting to use, and the parent args are used to set who is being greeted.
{{#invoke: Example | makeConfigGreeting | greeting = Hello }}, {{Scribunto example/Config hello|name=John Doe}}: Hello, John Doe!
{{#invoke: Example | makeConfigGreeting | greeting = G'day }}, {{Scribunto example/Config g'day|name=John Doe}}: G'day, John Doe!
Here, there are two templates calling the same module, and are using its direct args to configure it to use different greetings. Then the templates are transcluded as usual and the parent args from the templates are used for the name of the person being greeted.

To have a module be able to use direct args or parent args, you just need to check if either table has any values:

p.makeFlexableGreeting = function( f )
	local args = f.args
	local parentArgs = f:getParent().args
	for _ in pairs( parentArgs ) do
		args = parentArgs
		break
	end
	
	return args.greeting .. ', ' .. args.name .. '!'
end

This module gets both the direct args and the parent args then starts a loop over the parent args. If the parent args table is empty, the code inside the loop won't run, and thus the direct args will remain assigned to the args variable. Otherwise, the args variable will be reassigned to the parent args table, and then the loop will be stopped as knowing the existence of a value is all that is necessary.

To have a module be able to use direct args and parent args, you could just do something like this:

p.makeFlexableConfigGreeting = function( f )
	local args = f.args
	local parentArgs = f:getParent().args
	
	local greeting = parentArgs.greeting or args.greeting
	local name = parentArgs.name or args.name
	return greeting .. ', ' .. name .. '!'
end

Which works okay for a simple example like this. However, this will be messy for modules with lots of args. The proper way to do it is to iterate over both tables and merge them into one:

p.makeMergedGreeting = function( f )
	local directArgs = f.args
	local parentArgs = f:getParent().args
	
	local args = {}
	for _, argType in ipairs{ directArgs, parentArgs } do
		for key, val in pairs( argType ) do
			args[key] = val
		end
	end
		
	return args.greeting .. ', ' .. args.name .. '!'
end

This module iterates over both arg tables, merging them together and overwriting the direct args with the parent args. This is useful for more complex configuration where a template sets default configuration settings, and then those settings can be overwritten as appropriate when transcluding the template.

Both of these examples could run into issues where "empty" parent arguments overwrite the direct arguments, as Lua considers empty strings to be "truthy" values. This could be fixed by trimming the whitespace from the values (mw.text.trim()), and then checking if they are equal to an empty string before setting the value.

Using wikitext

Scripts can output wikitext, just like ordinary templates, however only the final expansion stage of the parser will be run on this wikitext. This means that templates, parser functions, extension tags, and anything else that can output wikitext will not be processed when used in the output. In order to use these features properly, Scribunto provides some functions to expand them to their final wikitext. Other things that seem like they shouldn't work, such as variables ({{PAGENAME}}) and behaviour switches (__NOTOC__), do work because they do not output wikitext, and thus don't require extra processing.

f:preprocess()

This is the most basic preprocessing function. It manually runs the preprocess stage of the parser on whatever text you provide it. You could theoretically run this on any wikitext before you return it to (almost) guarantee that all wikitext will work. (Note it does not convert basic wikitext into HTML like '''text formatting''' or [[links]].) Use of this function isn't recommended, not just because of the performance implications of unnecessarily running the parser, and the fact that it is full wikitext and thus is prone to the same limitations as full wikitext, but because it is a rather brute force approach. You'll probably always want to use one of the more specialised functions below.

f:expandTemplate()

This function is faster and less error-prone than manually constructing a wikitext transclusion to use in the above function. You're probably familiar with templates such as {{!}}, which allow special wikitext characters to be ignored by the preprocessor. The f:expandTemplate() function is not subject to these limitations. Something like this would work fine:

f:expandTemplate{ title = 'Example', args = { 'unnamed value 1', 'kittens are cute', named_arg_1 = 'Pipe characters? | No problem! =)' }}

This is equivalent to the following, you can write out unnamed args either way:

f:expandTemplate{ title = 'Example', args = { [1] = 'unnamed value 1', [2] = 'kittens are cute', named_arg_1 = 'Pipe characters? | No problem! =)' }}

Whereas with a normal template transclusion you would have to do this:

{{Example|arg1=Pipe characters? {{!}} Need escaping! {{=}}(}}

Like its wikitext equivalent, this can transclude pages in any namespace (or the main namespace by prefixing the title with :).

f:callParserFunction()

Same deal as the previous function, but this one is for parser functions. Don't use this to call parser functions where there is a Lua equivalent, like {{urlencode:}} (mw.uri.encode()). The Lua equivalent will always be faster and more reliable.

f:extensionTag()

This one is for extension tags, such as <nowiki/> (but don't use it for that, use mw.text.nowiki()). This is pretty much an alias for f:callParserFunction() with the {{#tag}} parser function set, and the tag content prepended to the args.

Modular modules

Modules can be used in other modules using the require() function. Any global variables in the required module will be available globally, and the required module's return value will be returned by require.

Here's a simple example module that will be required; note the use of global variables.

name = 'John Doe'
constructHello = function( person )
	return 'Hello, ' .. person .. '!'
end

Now to require it:

p.acquireGlobals = function( f )
	require( 'Module:Example/AcquireGlobals' )
	return constructHello( name )
end

{{#invoke: Example | acquireGlobals }}: Hello, John Doe!

While this works, globals in general aren't recommended, and because the required module does not return a table of functions it cannot be invoked directly, which makes it less useful and also difficult to debug. It is recommended to use local variables and return whatever variable you want requiring scripts to access, as this more flexible and makes debugging easier. Formatting required scripts in a similar manner to invoked modules (returning a table of functions and maybe other values) is even better, as it is fairly simple to adjust the script to be able to be required and invoked.

This script is closer to the typical invoking style:

local p = {}
p.name = 'John Doe'
p.constructHello = function( person )
	return 'Hello, ' .. person .. '!'
end
return p
p.requiredHello = function( f )
	local helloModule = require( 'Module:Example/Hello' )
	local name = helloModule.name
	return helloModule.constructHello( name )
end

{{#invoke: Example | requiredHello }}: Hello, John Doe!

Here's a simple way to set up a module that can be required or called from templates:

local p = {}
p.constructHello = function( f )
	local args = f
	if f == mw.getCurrentFrame() then
		args = f:getParent().args
	else
		f = mw.getCurrentFrame()
	end
	
	return 'Hello, ' .. args.name .. '!'
end
return p

This starts out by saving whatever f contains under args. Then it checks if f is a frame; if it is, the module is being invoked, and so it gets the frame's parent args and saves them to args. Otherwise, the module is being required from another module, and thus f contains the args, which is why it was assigned to args at the start. It is then reassigned to the current frame, so that it can be used for its useful functions, as if it had been invoked. If the module isn't using any of the frame's functions, then you could skip reassigning it.

This could then be called from a template in the usual way, or from a module like this:

p.requiredInvoke = function( f )
	local helloModule = require( 'Module:Example/FlexableHello' )
	return helloModule.constructHello{ name = 'John Doe' }
end

{{#invoke: Example | requiredInvoke }}: Hello, John Doe!
Note how the module is passing a table of values directly to the function of the module it is requiring.

Loading large tables of data

It's likely at some point you're going to want to have a big table of data that is referenced by various scripts. Repeatedly parsing this big table, which is always going to be the same, hundreds of times for individual scripts on a single page is a waste of time and memory. Scribunto provides the mw.loadData() function exactly for this purpose. This function is similar to require(), except the module it requires must return a table containing only static data. No functions, no metatables. Any subsequent calls to mw.loadData() for the same module in any scripts on the page will return the already parsed table.

The table is read-only and trying to mw.clone() the table will result in the read-only flag being cloned as well; though currently it actually results in a Lua error saying that the data table is read-only. If you need to modify the table, you should either go back to require() or iterate over the table, building a new one from its values.

return {
	name = 'John Doe'
	-- ... and lots of other data
}
p.bigData = function( f )
	local data = mw.loadData( 'Module:Example/Data' )
	return 'Hello, ' .. data.name .. '!'
end

{{#invoke: Example | bigData }}: Hello, John Doe!

Debugging

Scribunto will actually prevent saving a module with syntax errors, unless the "Allow saving code with errors" option is checked (for work-in-progress modules). However, other errors will be allowed to save and thus need to be debugged.

Script error

Script error: The function "this won't work" does not exist.

When a module breaks in a page, it will output a script error like the one above. Clicking it will show the error message, and if it was able to at least partially execute, a backtrace to where the error occurred. The page will also be added to the pages with script errors tracking category.

Debug console

It's probably best to find errors before saving your changes, and for this purpose a debug console is provided underneath the edit area. Using the console isn't very obvious to begin with, as it acts more like the module has been required, rather than invoked, so frame functions won't work unless you manually pass a frame to it.

For a module that doesn't use any frame functions, using the debug console is reasonably simple. Pass a table containing the args table under the name "args" to the function:

Scribunto console example frame args.png

And for a module that is actually already set up for being required it is even easier:

However, a module that is not set up for being required and uses frame functions is a bit more difficult to debug. You first have to get the frame, then set the args on the frame, then pass the frame to the function:

Scribunto console example frame.png

Notice how the variables are retained between multiple console commands. The state is only cleared when the module is changed (a message will be displayed indicating this), or when the clear button is pressed.

A module that only accepts parent args will have to be edited first to properly support requiring; you could also just temporarily comment out the original logic and assign the args to the frame:

Scribunto console example parent frame.png

Any calls to the mw.log() and mw.logObject() functions in the module will also be displayed in the console, before the return value, as long as the module doesn't encounter any errors.

Known issues and solutions

Unexpected behavior

mw.clone() fails on tables made by mw.loadData()

Data tables returned by mw.loadData() cannot be modified, and this will result in an error. Strangely, this same error is caused by using mw.clone(), probably due to metatable "magic" used by loadData() results to make them immutable. The solution may be to require() the data module instead, but it is not recommended; mw.loadData() is somewhat more efficient as the module is loaded only once in the process of generating a page, rather than once in every #invoke. If you need a different version of a data module (such as with a different layout), you may create and load with loadData() a separate data module that calls loadData() on the first module, manually creates the derived table, and later returns it. (Data modules may contain any code, the requirements are only on what they return.)

The order of iteration in "pairs()" is not specified

If your template is supposed to take a variable number of non-numeric arguments like "image1", "image2", "image3", "image4", "image5", etc., you will find that using pairs() to iterate over the args table will not give these arguments in the order of their numeric suffixes. In addition, there is no way to access the order in which the arguments were specified in wikitext.

The problem with number-suffixed string parameters can be bypassed by writing custom code. For example, these can be used as library functions:

--[=[
Extracts the sequence of values from "someTable" that have a string prefix "prefix"
and an integral suffix. If "allowNumberless" is specified as true, the value without
a prefix is treated like the one with prefix 1.

Examples:
```
local t = { arg1 = "one", arg2 = "two", arg3 = "three" }
local t2 = p.extractPrefixedSequence(t, "arg")
-- t2 = { "one", "two", "three" }
```

If "allowNumberless" is true, and "someTable" has both a value with no suffix and
one with suffix 1, the function may return either value.
```
local t = { arg = "one", arg1 = "also one", arg2 = "two", arg3 = "three" }
local t2 = p.extractPrefixedSequence(t, "arg", true)
-- depending on the implementation, t2[1] may be "one" or "also one"
```

The produced sequence may have holes, which can cause problems with the # operator
and ipairs.
```
local t = { arg1 = "one", arg2 = "two", arg4 = "suddenly four" }
local t2 = p.extractPrefixedSequence(t, "arg")
-- t2 = { "one", "two", nil, "suddenly four" }
```
]=]
function p.extractPrefixedSequence(someTable, prefix, allowNumberless)
	local values = {}
	
	if allowNumberless and someTable[prefix] then
		values[1] = someTable[prefix]
	end
	
	local prefixPattern = "^" .. prefix .. "(%d+)$";
	for key, value in pairs(someTable) do
		local index = tonumber(key:match(prefixPattern))
		if index and index > 0 then
			values[index] = value
		end
	end
	
	return values
end

--[=[
Acts like ipairs for a sequence with a prefix.

This code:
```
for k, v in p.ipairsWithPrefix(someTable, prefix, allowNumberless) do
    -- ...
end
```

should be the same as this code:
```
for k, v in ipairs(p.extractPrefixedSequence(someTable, prefix, allowNumberless)) do
    -- ...
end
```

however, in the edge case when "allowNumberless" is true, and both a numberless and
a 1-suffixed value are present, the functions may return different values for index 1.
]=]
function p.ipairsWithPrefix(someTable, prefix, allowNumberless)
	local i = 0
	return function()
		i = i + 1
		local value = someTable[prefix .. tostring(i)]
		if i == 1 and allowNumberless and someTable[prefix] ~= nil then
			value = someTable[prefix]
		end
		
		if value ~= nil then
			return i, value
		end
	end
end

The # operator has unspecified behavior for sequences with "holes"

If a sequence table has "holes" – nil values preceding non-nil values, – the # operator may treat any "hole" like the end of the sequence. This may be a problem, for example, if you're applying it to the arguments table for a module after using an arguments-processing library. Such libraries often replace insubstantial (empty or whitespace-only) arguments with nil values.

The ipairs iterator function will stop at the first hole instead.

If you need to use # or ipairs on a sequence that may contain holes, you may want to use table.maxn() in your code instead. This function will return the highest numeric index with a non-nil value associated with it. If needed, you could also fill the gaps with placeholder values after you iterate over the table using the maxn function to get the end index.

local leaky_table = {1, 2, 3, 4, 5, nil, 7, 8, 9, nil, 11, 12}
for index = 1, table.maxn(leaky_table) do
    if leaky_table[index] == nil then
        leaky_table[index] = index -- insert whatever you need, of course
    end
end
-- leaky_table is now {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

Highly technical implementation details: The reason behind the unspecified behavior of # is probably that Lua tables internally have a "sequence" part for arrays of numbers and a "hash" part for all other data. Using only the "sequence" part for all numeric indices would cause sparse arrays (especially very sparse arrays with very large indices) to be memory inefficient, which is why after some "hole" subsequent number-indexed elements get stored in the "hash" part instead. For example, { [1000000000] = "not Google Chrome" } does not allocate gigabytes of empty space.

Lua numbers are double-precision floats and silently round very large integers

In the version of Lua used by Scribunto, all numbers are not of an integral type, but of a type called double-precision floating-point number (or simply double). This type has quirks that will not be obvious to beginning Lua programmers. For example, after some point, not all integers can be stored as doubles. For doubles, the smallest non-representable positive integer is 9,007,199,254,740,993. Such integers are rounded, and as a result, the expression tonumber("9007199254740993") == tonumber("9007199254740992") evaluates to true. This is usually not a problem, but, for example, concatenating many or large integers together and trying to interpret the result as a number may cause unexpected behavior.

There is hardly any best solution, but it may help to be aware of the problem and write your modules so that they won't cause it. While Lua 5.3 supports 64-bit integers with the ability to represent somewhat more integers exactly, Scribunto is based on 5.1 and is unlikely to even completely implement all 5.2 features.

binser should not be used for storing numbers

If your wiki has the "binser" library enabled, you should not try to use callParserFunction with Extension:Variables #vardefine while passing it the output of binser's serialization function. "Bin" in "binser" stands for "binary", and the serialization may result in any bytes. At the same time, #vardefine expects proper text. For example, a user had an error in that serializing 5 using binser resulted in a space character, which callParserFunction, like in typical wikitext, trimmed into an empty value. Serializing the number 256 led to bytes that aren't valid UTF-8, and the stored text ended up being two replacement characters.

As a solution, use mw.text.jsonEncode and mw.text.jsonDecode for handling variables. This works not only with tables, but with other types (such as numbers and strings), and the decoded value should be already of the right type.

Performance

mw.text.split is very slow

On some tests, this function ended up being over 60 times slower than a custom reimplementation. If possible, use the string library instead to split your springs using Lua patterns. For example:

--[[
Splits a string `str` using a pattern `pattern` and returns a sequence
table with the parts.

Much faster than `mw.text.split`, which it is inspired by, but does
not work if the pattern needs to be Unicode-aware.
]]
function split(str, pattern)
    local out = {}
    local i = 1
    
    local split_start, split_end = string.find(str, pattern, i)
    while split_start do
        out[#out+1] = string.sub(str, i, split_start - 1)
        i = split_end + 1
        split_start, split_end = string.find(str, pattern, i)
    end
    out[#out+1] = string.sub(str, i)
    
    return out
end

mw.text.trim is slow

Similar to the above, the trim function is also rather slow. Moreover, it's quite memory-intensive and may result in Scribunto errors if used on very large strings. If you do not need to handle non-ASCII whitespace, you can use something like this instead:

local function trim( s )
    return (s:gsub( '^[\t\r\n\f ]+', '' ):gsub( '[\t\r\n\f ]+$', '' ))
end

Note that this implementation intentionally uses two gsub calls instead of one with a capture group. Testing has shown that the version with a capture group would use more memory and be slightly less performant. The extra layer of parentheses around the expression is relevant because gsub returns two values, and the function should only return one.

Lua error: Internal error: The interpreter has terminated with signal "24".

This problem occurs when the interpreter is running too long and is terminated. For example, having an infinite loop in your module can cause this.

This error can also occur when long or resource-intensive modules are run, and the server the wiki is located on is under heavy load. It may be possible to avoid this error by improving the module so that it runs faster.

Syntax highlighting

Enabling Scribunto will also enable syntax highlighting of css and js pages in the MediaWiki namespace (MediaWiki:common.css, etc). To enable syntax highlighting in the Module namespace, a setting must be configured on the wiki ($wgScribuntoUseGeSHi). A wiki manager can do this for you.

See also