Skip to content

Sorting Algorithms

Sort by Corpus Position

Path: flexiconc/algorithms/sort_by_corpus_position.py

Description:

Sorts the concordance lines by their line_id, which corresponds to their position in the corpus.

Arguments:

No arguments defined.

Show full JSON schema
{
  "type": "object",
  "properties": {},
  "required": []
}

Sort by Token-Level Attribute

Path: flexiconc/algorithms/sort_by_token_attribute.py

Description:

Sorts the concordance lines by the given token-level attribute using locale-specific sorting (default 'en'). Supports sorting by a single token at a given offset, or by whole left/right context by joining tokens. When sorting by left context, tokens are joined from right to left. Optionally reverses strings for right-to-left sorting.

Arguments:

Name Type Description
tokens_attribute string The token attribute to sort by.
sorting_scope string Specifies which context to use for sorting: 'token' for a single token at the given offset, 'left' for the entire left context (joined from right to left), or 'right' for the entire right context.
offset integer The offset value to filter tokens by when sorting_scope is 'token'.
case_sensitive boolean If True, performs a case-sensitive sort.
reverse boolean If True, sort in descending order.
backwards boolean If True, reverses the string (e.g., for right-to-left sorting).
locale_str string ICU locale string for language-specific sorting.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "tokens_attribute": {
      "type": "string",
      "description": "The token attribute to sort by.",
      "default": "word",
      "x-eval": "dict(enum=list(set(conc.tokens.columns) - {'id_in_line', 'line_id', 'offset'}))"
    },
    "sorting_scope": {
      "type": "string",
      "description": "Specifies which context to use for sorting: 'token' for a single token at the given offset, 'left' for the entire left context (joined from right to left), or 'right' for the entire right context.",
      "default": "token",
      "enum": [
        "token",
        "left",
        "right"
      ]
    },
    "offset": {
      "type": "integer",
      "description": "The offset value to filter tokens by when sorting_scope is 'token'.",
      "default": 0,
      "x-eval": "dict(minimum=min(conc.tokens['offset']), maximum=max(conc.tokens['offset']))"
    },
    "case_sensitive": {
      "type": "boolean",
      "description": "If True, performs a case-sensitive sort.",
      "default": false
    },
    "reverse": {
      "type": "boolean",
      "description": "If True, sort in descending order.",
      "default": false
    },
    "backwards": {
      "type": "boolean",
      "description": "If True, reverses the string (e.g., for right-to-left sorting).",
      "default": false
    },
    "locale_str": {
      "type": "string",
      "description": "ICU locale string for language-specific sorting.",
      "default": "en"
    }
  },
  "required": [
    "tokens_attribute",
    "sorting_scope"
  ]
}

Random Sort

Path: flexiconc/algorithms/sort_random.py

Description:

Sorts lines in a pseudo-random but stable manner. Given a seed, any pair of line_ids always appear in the same relative order, independent of the presence of other lines.

Arguments:

Name Type Description
seed ['integer'] An optional seed for generating the pseudo-random order.
Show full JSON schema
{
  "type": "object",
  "properties": {
    "seed": {
      "type": [
        "integer"
      ],
      "description": "An optional seed for generating the pseudo-random order.",
      "default": 42
    }
  },
  "required": [
    "seed"
  ]
}