Automating File and Folder Analysis to Generate Structured Data

A practical look at automating file and folder analysis to generate structured data, uncover patterns, and improve web development workflows through systematic file system insights.

Automating File and Folder Analysis to Generate Structured Data

Table of Contents

Preamble

This article is rooted in a real production problem encountered while building MageQuest.online.

The project relies on tens of thousands of art assets, many of which are derived from the Liberated Pixel Cup (LPC) artwork originally published on OpenGameArt.org, with extensive modifications and extensions based on the LPC format maintained at github.com/ElizaWy/LPC.

At this scale, manually tracking assets or maintaining hand-written manifests quickly becomes impractical. The folder structure itself already encodes meaningful semantics — animation states, directions, equipment types, variations — but that information exists only implicitly. Without automation, the engine and tooling have no reliable way to understand what the assets represent.

This work emerged from the need to give the project semantic awareness of its own file system: to turn a massive, organically grown asset library into structured, machine-readable data that can drive loading, validation, tooling, and gameplay systems without constant manual intervention.

What follows is a practical breakdown of the approach used to extract that structure automatically, enforce consistency, and transform a complex asset hierarchy into reliable data.

Automating File and Folder Analysis to Generate Structured Data

Modern web projects often accumulate large, deeply nested asset directories. Over time, these folder structures start to encode useful information — categories, variations, conventions — but that information remains implicit and difficult to consume programmatically.

This article demonstrates a practical approach to analyzing a file system, detecting structural patterns, and converting them into structured JSON data that can be used by build tools, runtime code, or content pipelines.

Treating the File System as a Data Source

Instead of manually maintaining asset manifests, this script treats the directory hierarchy itself as the source of truth. By recursively traversing folders and applying a small set of rules, we can automatically derive a clean, predictable data structure.

At a high level, the script:

  • Restricts traversal to a known set of root directories
  • Recursively walks subfolders
  • Filters out unwanted files
  • Normalizes filenames
  • Outputs a structured JSON summary of the folder hierarchy
  • This approach scales naturally as assets grow and avoids duplicated configuration.

Recursive Structure Extraction

The core of the solution is a recursive directory traversal function. Each directory is analyzed to determine whether it contains only files, subfolders, or a mix of both:

Folders with only files become arrays

Folders with subfolders become objects

Files are optionally grouped under a files key when mixed with folders

This distinction keeps the resulting JSON compact while preserving semantic meaning.

function getDirectoryStructure(dirPath, ext) {
  const items = fs.readdirSync(dirPath, { withFileTypes: true });

  const files = [];
  const folders = {};

  for (const item of items) {
    const fullPath = path.join(dirPath, item.name);

    if (item.isDirectory()) {
      folders[item.name] = getDirectoryStructure(fullPath);
    } else if (item.isFile()) {
      let name = item.name;
      if (name.startsWith('_')) continue;
      if (name === 'Credits.txt') continue;
      if (name.endsWith(ext)) {
        name = name.slice(0, -5);
      }
      files.push(name);
    }
  }

  if (Object.keys(folders).length === 0) {
    return files;
  }

  if (files.length > 0) {
    folders.files = files;
  }

  return folders;
}

Enforcing Conventions and Constraints

The script intentionally limits traversal to a predefined set of root directories:

const allowedRootKeys = new Set(["FX", "Objects", "Structure", "Terrain"])

This constraint prevents accidental inclusion of unrelated files and enforces a clear organizational contract. Filtering rules (such as skipping private files or metadata like Credits.txt) further refine the output and ensure the generated data is immediately usable.

Generating Usable Output

Once traversal is complete, the resulting structure is serialized to JSON and written to disk:

fs.writeFile(`./src/scripts/data/${filename}`, data, 'utf8', ...)

The generated JSON represents a semantic map of a large character asset library, derived directly from the file system. It encodes how character visuals are composed — bodies, heads, hair, clothing, props, colors, animations, and genders — while preserving meaningful distinctions such as variants and subtypes.

Rather than listing files blindly, the structure captures intent: which assets belong together, how they vary, and how they are expected to be combined at runtime. Numeric values generally represent render or layer order, while nested objects describe configurable dimensions (for example, shield materials and heraldic patterns).

This allows the engine and tools to reason about tens of thousands of assets programmatically without manual manifests.

Because the JSON is generated automatically, it always stays in sync with the underlying file system.

{
  "Characters": {
    "Body": {
      "Body01-Feminine,Thin": 0,
      "Body02-Masculine,Thin": 0,
      "Wings01-FeatheredWings": 1,
      "Wings02-BatWings": 1
    },

    "Hair": {
      "Eyebrows01-ThinEyebrows": 2,
      "Eyebrows02-ThickEyebrows": 2,
      "FacialHair01-WalrusMustache": 2,
      "FacialHair02-ChevronMustache": 2,
      "FacialHair03-HandlebarMustache": 2,
      "…": "additional hair and facial hair variants"
    },

    "Clothing": {
      "Feet": {
        "Shoes01-Shoes": 4,
        "Shoes02-Boots": 4,
        "Socks01-AnkleSocks": 4,
        "…": "other footwear"
      },
      "Torso": {
        "Shirt01-LongsleeveShirt": 4,
        "Shirt04-T-shirt": 4,
        "Shirt09-Polo": 4,
        "…": "additional shirt styles"
      }
    },

    "Props": {
      "Sword01-ArmingSword": 4,
      "Shield01-HeaterShield": {
        "Trim": 4,
        "Wood": 5,
        "Paint": 5,
        "ContrastPatterns": {
          "Bend": 5,
          "Chevron": 5,
          "Cross": 5,
          "…": "heraldic pattern variants"
        }
      }
    }
  },

  "types": {
    "Hair": {
      "FacialHair": [
        "WalrusMustache",
        "ChevronMustache",
        "HandlebarMustache",
        "…"
      ],
      "Short": [
        "Buzzcut",
        "Parted",
        "Curly",
        "…"
      ]
    },
    "Clothing": {
      "Torso": {
        "Shirt": [
          "LongsleeveShirt",
          "T-shirt",
          "Polo",
          "…"
        ]
      }
    }
  },

  "colors": [
    ["Bronze", "Ivory", "Tan", "…"],
    ["Azure", "Crimson", "Emerald", "…"],
    ["Black", "Blonde", "Chestnut", "…"],
    ["Black", "Blue", "Hazel", "…"],
    ["Gold", "Iron", "Steel", "…"],
    ["Brown", "Oak", "Umber"]
  ],

  "anims": [
    "Idle",
    "Walk",
    "Run",
    "Jump",
    "Combat1h-Slash",
    "…"
  ],

  "genders": {
    "M": "Masculine",
    "F": "Feminine",
    "E": "Elderly"
  }
}

Why This Pattern Works

This approach highlights a broader idea: file systems already contain structured data — we just need to formalize it. By automating analysis and enforcing small, consistent rules, we can eliminate manual manifests, reduce errors, and make large projects easier to reason about.

As projects grow, automation like this turns implicit structure into explicit, reliable data — and that’s where real scalability begins.

Related Articles

Browse Articles by Topic

Latest Articles

...
Go to top of page