pytorch_pfn_extras.dataset.TabularDataset

class pytorch_pfn_extras.dataset.TabularDataset(*args, **kwds)

An abstract class that represents tabular dataset.

This class represents a tabular dataset. In a tabular dataset, all examples have the same number of elements. For example, all examples of the dataset below have three elements (a[i], b[i], and c[i]).

a

b

c

0

a[0]

b[0]

c[0]

1

a[1]

b[1]

c[1]

2

a[2]

b[2]

c[2]

3

a[3]

b[3]

c[3]

Since an example can be represented by both tuple and dict ( (a[i], b[i], c[i]) and {'a': a[i], 'b': b[i], 'c': c[i]}), this class uses mode to indicate which representation will be used. If there is only one column, an example also can be represented by a value (a[i]). In this case, mode is None.

An inheritance should implement __len__(), keys, mode and get_examples().

>>> import numpy as np
>>>
>>> from pytorch_pfn_extras import dataset
>>>
>>> class MyDataset(dataset.TabularDataset):
...
...     def __len__(self):
...         return 4
...
...     @property
...     def keys(self):
...          return ('a', 'b', 'c')
...
...     @property
...     def mode(self):
...          return tuple
...
...     def get_examples(self, indices, key_indices):
...          data = np.arange(12).reshape((4, 3))
...          if indices is not None:
...              data = data[indices]
...          if key_indices is not None:
...              data = data[:, list(key_indices)]
...          return tuple(data.transpose())
...
>>> dataset = MyDataset()
>>> len(dataset)
4
>>> dataset.keys
('a', 'b', 'c')
>>> dataset.astuple()[0]
(0, 1, 2)
>>> sorted(dataset.asdict()[0].items())
[('a', 0), ('b', 1), ('c', 2)]
>>>
>>> view = dataset.slice[[3, 2], ('c', 0)]
>>> len(view)
2
>>> view.keys
('c', 'a')
>>> view.astuple()[1]
(8, 6)
>>> sorted(view.asdict()[1].items())
[('a', 6), ('c', 8)]
__init__()

Methods

__init__()

asdict()

Return a view with dict mode.

astuple()

Return a view with tuple mode.

concat(*datasets)

Stack datasets along rows.

convert(data)

Convert fetched data.

fetch()

Fetch data.

get_example(i)

get_examples(indices, key_indices)

Return a part of data.

join(*datasets)

Stack datasets along columns.

transform(keys, transform)

Apply a transform to each example.

transform_batch(keys, transform_batch)

Apply a transform to examples.

with_converter(converter)

Override the behaviour of convert().

Attributes

keys

Names of columns.

mode

Mode of representation.

slice

Get a slice of dataset.