rejetto forum

Brain-storm HFS2 (template) macros parsing & evaluation techniques (for HFS3)?

0 Members and 1 Guest are viewing this topic.

Offline NaitLee

  • Tireless poster
  • ****
    • Posts: 204
  • Computer-brain boy
    • View Profile
A note for passing-by guests: this is an technical topic. For seeking template themes see other topics :)

HFS3 default frontend is so fast.
But for template makers like me, want to make template useful for both HFS versions (HFS2 and 3)
In my thought it's not the disliked "compatible", but "universal", since there's no reason for a frequent casual user to leave away from HFS2.

Now I am making a plugin for the new HFS3 to support "traditional" templates.
Macros are there for HFS2.3 to implement useful logics, making a (dynamic pages based) template more "smart".
I've already made it to parse macros in PHFS. But if you tried using it you can find it's very slow, compared to Delphi HFS2.
Yes, Python itself is slow in these basic operations like string batch,
but there are other reasons, including every time we request a section it parses through the raw strings again and again, even if the macro procedure is fixed.

I want to make things faster. Though may still slower than pure-ajax, I want to try my best, at least for skill practicing :)
I'm thinking about serializing macros to make least waste in each execution/evaluation.
And, after this, macro injection (attack) will never work, even if there's an entrance for such action.
As for now I got some ideas, stated below, in normal text and/or source code (with comments)...



Some concepts are made:

MacroSegment and MacroUnit
These will nest some instances of each other to make the macro procedure clear & easy for computer.
Get more details in the code snippets below. Be prepared for thinking :D

MacroContext and MacroContextGlobal
These are for storing eg. variables, and stack for "liner macro execution"[1] (more info below)
Code in snippets may be modified to add more things.

MacroExecutor and MacroExecutors
For defining static functions to execute macros. A MacroUnit have an executor attribute assigned to one in MacroExecutors.
This may change to getter/setter in the future, to support "dynamic executor"[2]

Also see Footnote, FaQ, and Trivia, at the end of this post :D

Some (TypeScript) code snippets, for description: (may be modified at any time)
Code: [Select]
class MacroContextGlobal {
static globals: Record<string, string> = {};
static cache: Record<string, string> = {};
}

class MacroContext extends MacroContextGlobal {
    variables: Record<string, string> = {};
    stack: MacroContextStackItem[] = [];
    shift(count: number = 1): MacroContextStackItem[] {
        return new Array(count).map(() => this.stack.shift() || null);
    }
    shiftAll(): MacroContextStackItem[] {
        return this.stack.splice(0, this.stack.length);
    }
}

interface MacroExecutorFunction {
    (ctx: MacroContext, args?: MacroExecutorArgs, kwargs?: MacroExecutorKwargs): MacroSegment;
}

class MacroExecutor {
    /** @this {MacroExecutor} */
    _function: MacroExecutorFunction;
    flags: MacroExecutorFlags;
    constructor(
        func: MacroExecutorFunction,
        flags: MacroExecutorFlags = C.NO_MULTI_FLAG
    ) {
        this._function = func.bind(this);
        this.flags = flags;
    }
    execute(ctx: MacroContext, args: MacroExecutorArgs, kwargs: MacroExecutorKwargs): MacroSegment {
        // NOTE: in the future, we may check some flags here before execution
        return this._function(ctx, args, kwargs);
    }
}

var macroExecutors = new MacroExecutors();

/**
 * A "part" of the whole macro expression, like a quote block, or a piece of string as argument of a macro.
 * A `MacroSegment` can be *evaluated*, to produce a plain string, then send to client / put into `MacroUnit` args/kwargs.
 * The term *evaluate* can be understood as original *dequote*, if there are items in `execOrder`.
 * In a section there's a root `MacroSegment`. 
 *
 * Concepts:
 * - `segOrder` and `execOrder`:
 *   - Macros are mixed with plain parts and executable parts,
 *     for result production, we first take a sub-segment from `segOrder` as text,
 *     then, we take a `MacroUnit` from `execOrder` then execute it, finally get text.
 *     By repeating until last `segOrder`, we complete.
 * - `isPlain`:
 *   - For marking current segment as plain, i.e. no need to be executed.
 * - `isAlias`:
 *   - For marking current segment as alias from `[special:alias]`
 */
class MacroSegment {
        // ... there are some attributes for plain representation as string, number, boolean. will change later
    segOrder: MacroSegment[];
    execOrder: MacroUnit[];
    isPlain: boolean;
    isAlias: boolean;
    isDynamic: boolean;
    private _inferTypesFromString(value: string): void {
        this._asString = value;
        let value_trimmed = value.trim();
        let possible_number = tryParseNumber(value_trimmed);
        this._asNumber = possible_number;
        this._asBoolean = !!possible_number;
    }
    constructor(
        raw: string = C.EMPTY_STRING,
        segOrder: MacroSegment[] = [],
        execOrder: MacroUnit[] = [],
        isPlain: boolean = true,
        isAlias: boolean = false,
        isDynamic: boolean = false
    ) {
        this._inferTypesFromString(raw);
        // if (raw === null) {}
        this.segOrder = segOrder;
        this.execOrder = execOrder;
        // this.isPlain = this.isAlias = (raw !== null);
        this.isPlain = isPlain;
        this.isAlias = isAlias;
        this.isDynamic = isDynamic;
    }
}

/**
 * A part of the whole macro expression that have specified function, as a macro block. 
 * A `MacroUnit` can be *executed*, for performing special operations.
 *
 * Concepts:
 * - `executor`:
 *   - An instance of `MacroExecutors`, taken from `MacroExecutors`.
 * - `args`:
 *   - A list of arguments, as `MacroSegment`.
 *     They **may** be dynamically *evaluated* by individual `MacroExecutor`.
 * - `kwargs`:
 *   - A list of keyword arguments, always optional, indexed with string, also as `MacroSegment`.
 */
class MacroUnit {
    executor: MacroExecutor;
    args: MacroSegment[] = [];
    kwargs: Record<string, MacroSegment> = {};
    constructor(
        executor: MacroExecutor = MacroExecutors._unknown,
        args: MacroSegment[] = [],
        kwargs: Record<string, MacroSegment> = {}
    ) {
        this.executor = executor;
        this.args = args;
        this.kwargs = kwargs;
    }
}


Footnote:

[1] "liner macro execution"
Let's consider an example:
{.add|{.mul|2|3.}|4.}
The normal way is to walk from start, see the most-inner macro, pick up, execute, then replace it as result, then do again until end...
But in our way, after serialization, instructions are ordered there one by one:
execOrder = [ mul, add ]; (pseudo code. note that these are MacroUnits, which wrapped both an executor and arguments (as nested MacroSegments, plain or evaluatable))
... after the "mul" unit executed, it's result is pushed to stack of current MacroContext, then in "add" unit we leave a mark to let it shift one element from the stack as an argument.
This is mind-exhausting, but computer is really doing effective liner action.

[2] "dynamic executor"
Another example:
{.{.if|{.^want_sub.}|sub|add.}|5|3.}
I think most dynamic language developers have tried such method to determine which function to use. :D
(wantSub ? sub : add)(5, 3) (sub and add are functions)
While it just works, it may confuse a static computing rule.
So our MacroExecutor need to be dynamic at here, by making the executor attr a getter.

FaQ:
= Why don't publish source code now?
- The source now only contains these "ideas" and completely not usable. It takes some time to integrate this large scale.
= Well... where will the source code be?
- On here of GitLab. But it's empty now.
= What's wrong with GitHub?
- Here have trouble accessing it, ranging the whole mainland region. Successfully viewing is by luck.
= Mirror to GitHub?
- I'll consider/try when the project become active.

Trivia:
I scribbled on my note paper in order to understand all of these by myself.
This project is developing on a new laptop with Manjaro GNU/Linux, for playing with edge-technique stuffs now my main workstation
I didn't want to touch Node.js, until I want to work on this. :)
The source code is full of typo "executer" before I post this. :P
I'm trying out Tabnine, an AI assist for coders. It auto-completed many pieces of code here. (Note: no advertisement meanings at all, but may help)
« Last Edit: January 20, 2022, 01:44:24 PM by NaitLee »
"Computation is not forbidden magic."
Takeback Template | PHFS


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13438
    • View Profile
:) good luck

your gitlab link says Page Not Found
« Last Edit: January 17, 2022, 10:51:57 PM by rejetto »


Offline NaitLee

  • Tireless poster
  • ****
    • Posts: 204
  • Computer-brain boy
    • View Profile
ok it's set to public :)

I'm thinking about parallel segment evaluation & just-in-time compiler (to optimize segment structure)
This will require more efforts so I'd try them far later :D
"Computation is not forbidden magic."
Takeback Template | PHFS


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13438
    • View Profile
https://wiki.c2.com/?PrematureOptimization

anyway, you may parallelize multiple requests but not multiple segment of a single request, because they may depend on the state changed by another


Offline NaitLee

  • Tireless poster
  • ****
    • Posts: 204
  • Computer-brain boy
    • View Profile
I want to get suggestions about whether to write a part of a HFS 3 plugin in another language (like C, Go, C++)
Particularly I want to write my template parser with Go, and communicate with a "shell" js with stdio.
This may gain performance. But what I actually want is to avoid the mess of node.js
My tpl plan have already been interrupted twice, mostly because I have no mind what to do and how to continue.
Whenever I want a small feature I need to request/install another big package. Whenever I need a small structure I end up with a large object/class.
Even if I want to call a C procedure natively I need a "node plugin", a quite bloated structure in C++, and will break at any time in the future.
This is for big organized applications, but rather complicated for a simple language parser, dealing with simple computational logic.
This defeated the sole purpose. Even in Python I have not seen such a dilemma.
JavaScript is doing well long ago, but finally ruined by jquery/react/minimizors/obfuscators, and finally node. Even TypeScript failed to rescue this.

... ... See what Unix old guys say to: OO Programming Node.js All Software
Mostly for fun. But reasonable, as the cruel reality.

HFS 3 is in Node, it works so well.
Let me try to do better in another way... :)


∵ E = MC2
∴ Errors = (More Code)2
∎ Q.E.D.

"Computation is not forbidden magic."
Takeback Template | PHFS


Offline rejetto

  • Administrator
  • Tireless poster
  • *****
    • Posts: 13438
    • View Profile
the base suggestion is: use as little as possible.
Since HFS is using javascript for plugins it's clear that a javascript plugin is normally ideal, but if you are struggling so much, then go head with the method you like most.
HFS 3 server is using libraries (node_modules) for a total of 2.4 MB (not zipped). Not a huge deal if you ask me.
I hardly see why you'd need  to call a C procedure for this kind of task. In HFS3 I have a need for it, but decided to skip that to avoid further platform dependencies. (i needed it to read windows file system attributes)