Scene configuration

Describes the scene format of the framework

Overview

Scene configuration of the framework is described with a configuration file written in YAML. As we explained in the previous section, all we need to generate an image is written in the file.

On designing the format we mainly focus on human-readbility and extensibility. The scene configuration file should be human-readable, because in the research-oriented usecases, the scene file often modified by hands, or by scripts. Also our format supports user-defined configuration, which allows to create new data entries into the scene description. We will describe the detail in Plugin section.

In this section of the documnetaiton, we will describe the basic structure of the scene format and what each components means and how we should modify the configuration to create an arbitrary scenes. The scene format is the basic interface between the user and the renderer, so we recommend to read and understand this section before you begin with creating your own scenes.

YAML primer

The scene format is written in YAML. YAML is designed to be human friendly data serialization format to describe various kind of data. Here we describe minimum introduction to YAML needed to read and write the scene format.

Scalar

Scalar is the basic data type of YAML. All data such as number or strings are written in one of the data types: boolean, integer, floating point, and string. For instance, true is a scalar with boolean type, 3.14 is a scalar with floating-point type.

Mapping

Mapping can represent a data structure with pairs of key and value. A key and its value are separated by :. The set of key-value pair within the same indentation level are in the same mapping. For example, the following lines defines a mapping with three key-value pairs. A is associated with 10, B is associated with 20, and so on.

A: 10
B: 20
C: 30

Sequence

Sequence can represent a sequence of data. The sequence of lines begin with - would be the elements of the list. Note that - character must be in the same indentation level. For instance, the following lines represents a sequence with three elements A, B, and C.

- A
- B
- C

Nested structure

Mapping or sequence can be nested within the elements. The next examples shows an example of the nested sturecture. The top level is the mapping with two elements. First element with the key A has a sequence as its value, and the sequence has two elements (a1, a2). Second element with the key B has a mapping as its value, which associated two key-value pairs (B1, b1) and (B2,b2).

A:
  - a1
  - a2
B:
  B1: b1
  B2: b2

Tips

Comments starts with # character until the end of line.
Using tab character is prohibited you need to create indentation with spaces. We suggest two spaces for the indentation.

Scene structure

Here we will describe the structure of the scene format. All scene file must contain a mapping with single element with the key lightmetrica:

lightmetrica:
  ...

The structure of the scene format is separated into several components. (1) version element indicates the version of the scene format. (2) assets element defines assets (e.g., meshes, materials, textures, etc.) (3) scene element describes the structure of the scene by defining the relationship between assets. (4) accel element represents acceleration structure for the speed-up of the renderers. (5) renderer element describes the type of the rendering techniques and its parameters. In the following sections, we will describe the each element in detail.

lightmetrica:
  version:  ... # (1)
  assets:   ... # (2)
  scene:    ... # (3)
  accel:    ... # (4)
  renderer: ... # (5)

Version

version element contains a single string value with x.y.z format representing the version for the scene configuration file. The value of x, y, z represents the major, minor, and patch version number respectively. Current version is fixed to 1.0.0. The framework defines the minimum and maximum acceptable scene versions and if the input scene is out of the range, the rendering process will be terminated.

Note

The scene version is different from the framework version. The framework version has (major).(minor).(patch).(build) format. Build number is automatically assigned to the short revision number of the latest repository. The current framework version is .

Asset library

assets element defines a collection of assets and we call it an asset library. All assets such as triangle meshes, material definitions, or any other user-defined assets are written in the element. The element contains a mapping between the asset ID and the definition with the following format. Note that we modify actual indentation to explain the contents.

assets:
  <asset_id_1>:
    interface: <asset_interface>
    type: <asset_type>
    params:
      <asset_params>
  <asset_id_2>:
    ...
  ...

asset_id_1, asset_id_2, … represents the keys of assets, each defines an asset by supplying interface, type, and params elements. The keys are utilizes as a reference to the assets from the other part of the file. We can define any number of assets in the element.

Asset interface specified by interface element is the larger category of the assets, e.g., textures, or bsdf. We specifies the specific type of the asset for the interface using type element, e.g., detailed material types for the bsdf interface. We will describe the interfaces and types in Assets section.

Scene hierarchy

The basic model for scene structure follows the scene graph, which is an tree structure that each node has a 3D mesh data and a material information, as well as the transformation of the mesh such as translation or rotation. The transformation is applied hierarchically. That is, the transformation of the parent node is applied to all child nodes.

The mechanism is useful for describing the relationship between objects hierarchically. Let’s think about a model with table and several cups on the the table. If we assign the cups as child nodes of the table, the movement applied to the table does not change the relative position of the cups to the table.

scene:
  sensor: <sensor_node_id>
  nodes:
    - id: <node_id_1>
      ...
    - id: <node_id_2>
      ...
    ...

Transformation

Each node contains several elements according to node types. The common element is the transform element which represent a transformation of the node, and there are several ways to do so.

The first way is using the transformation matrix by specifying matrix node, which is useful for directly output the transformation from the DCC tools. The matrix element contains a string with 4*4=16 elements of the matrix.

- id: <node_id>
  transform:
    matrix: 1 0 0 0 0 1 0 0 ...

The order of the elements in matrix are column-major. That is, the consecutive elements of the column vectors are sequentially contained in the element. As a result, the matrix in matrix element looks transposed. For example, the matrix $$ \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ would be written as

- id: <node_id>
  transform:
    matrix: >
      1 0 0 0
      0 1 0 0
      0 0 1 0
      1 2 3 1

Note

The vector and the matrix type is written as a scalar with the string type containing the elements, not with the sequence of numbers. That is, the renderer does not accept the following lines.

transform:
  matrix: [1, 0, 0, 0, …]

Another way to represent a transformation is to use combination of translate, rotate, and scale elements. We can specify the translation of the node with translate element, the rotation with rotate element, and the scaling with scale element.

Although there are several combination of application order of three transformations, we fixed the order: first scale, second rotate, and translate in the last.

- id: <node_id>
  transform:
    translate: 1 2 3
    rotate:
      axis: 1 0 0
      angle: 90
    scale: 2 2 2

translate element specifies a vector with three elements which represents a translation in 3-dimensional space. In this example, the node is translated by a vector (1,2,3).

rotation takes additional elements: axis and angle. axis specifies the axis of rotation and angle denotes the angle around the axis in degrees, counterclockwise. In this example, the model in the node is rotated around x axis by 90 degrees.

scale element specifies three dimensional vector, each represents a scaling for each direction of the axis. In the example, the model in the node is uniformly scaled to all axis to double.

Mesh node

A node with mesh and bsdf elements. mesh specifies a reference to the triangle mesh asset which represents a 3D model. bsdf specifies a reference to the BSDF asset which represents a material information associated with the mesh.

- id: <node_id>
  mesh: <mesh_asset_id>
  bsdf: <bsdf_asset_id>

Note

We must specify both mesh and bsdf. If one of the elements are missing, the renderer generates an error and terminates.

Light node

A node with light element defines a light node which indicates a light source. Without a light source in the scene, the rendered images would be completely black so there must be at least one light source in the scene. Note that some types of the lights can associate to the mesh. That is, we can additionally use mesh and bsdf elements along with the light element.

- id: <node_id>
  light: <light_asset_id>

Sensor node

A node with sensor element represents a sensor node. We specifies the reference to the sensor asset as a string value of the element. Although we can define the several sensors in the node, the sensor utilized for rendering must be one. We denote the sensor by sensor element in a child of the scene element.

scene:
  sensor: <sensor_node_id>
  - id: <sensor_node_id>
    sensor: <sensor_asset_id>

As for the transformation of the sensor, we follow the coordinate system adopted in OpenGL (cf. see here). So the default sensor configuration are +y for up and -z for forward. Note that this configuration is only valid for the sensor with perspective transform.

Also we can use the transformation specialized to the sensor node. lookat element defines view transformation matrix by specifying the sensor position (eye), the position that the sensor looks at (center), and the up vector of the sensor (up).

transform:
  lookat:
    eye: <sensor_position>
    center: <lookat_position>
    up: <up_vector>

Acceleration structure

The acceleration structure is introduced to accelerate ray-triangle intersection which is necessary for renderers. Although this element is not necessary (that is, it can be omitted), we can control the behavior of the acceleration structure if we want to have precise control on them.

accel:
  type: <accel_type>
  params:
    <parameters_for_accel>
    ...

Renderer

We specify the rendering techniques with renderer elements. The user of the framework can select various rendering techniques. The basic control of the renderers, e.g., rendering time or number of samples can be configured with this element. We will explain the detailed parameters in Renderers section.

renderer:
  type: <renderer_type>
  params:
    <parameters_for_renderer>
    ...