pytorch_pfn_extras.dataset.TabularDataset#
- class pytorch_pfn_extras.dataset.TabularDataset(*args, **kwds)#
Bases:
Dataset
An abstract class that represents tabular dataset.
This class represents a tabular dataset. In a tabular dataset, all examples have the same number of elements. For example, all examples of the dataset below have three elements (
a[i]
,b[i]
, andc[i]
).a
b
c
0
a[0]
b[0]
c[0]
1
a[1]
b[1]
c[1]
2
a[2]
b[2]
c[2]
3
a[3]
b[3]
c[3]
Since an example can be represented by both tuple and dict (
(a[i], b[i], c[i])
and{'a': a[i], 'b': b[i], 'c': c[i]}
), this class usesmode
to indicate which representation will be used. If there is only one column, an example also can be represented by a value (a[i]
). In this case,mode
isNone
.An inheritance should implement
__len__()
,keys
,mode
andget_examples()
.>>> import numpy as np >>> >>> from pytorch_pfn_extras import dataset >>> >>> class MyDataset(dataset.TabularDataset): ... ... def __len__(self): ... return 4 ... ... @property ... def keys(self): ... return ('a', 'b', 'c') ... ... @property ... def mode(self): ... return tuple ... ... def get_examples(self, indices, key_indices): ... data = np.arange(12).reshape((4, 3)) ... if indices is not None: ... data = data[indices] ... if key_indices is not None: ... data = data[:, list(key_indices)] ... return tuple(data.transpose()) ... >>> dataset = MyDataset() >>> len(dataset) 4 >>> dataset.keys ('a', 'b', 'c') >>> dataset.astuple()[0] (0, 1, 2) >>> sorted(dataset.asdict()[0].items()) [('a', 0), ('b', 1), ('c', 2)] >>> >>> view = dataset.slice[[3, 2], ('c', 0)] >>> len(view) 2 >>> view.keys ('c', 'a') >>> view.astuple()[1] (8, 6) >>> sorted(view.asdict()[1].items()) [('a', 6), ('c', 8)]
Methods
__init__
()asdict
()Return a view with dict mode.
astuple
()Return a view with tuple mode.
concat
(*datasets)Stack datasets along rows.
convert
(data)Convert fetched data.
fetch
()Fetch data.
get_example
(i)get_examples
(indices, key_indices)Return a part of data.
join
(*datasets)Stack datasets along columns.
transform
(keys, transform)Apply a transform to each example.
transform_batch
(keys, transform_batch)Apply a transform to examples.
with_converter
(converter)Override the behaviour of
convert()
.Attributes
Names of columns.
Mode of representation.
Get a slice of dataset.
- concat(*datasets)#
Stack datasets along rows.
- Parameters:
datasets (iterable of
TabularDataset
) – Datasets to be concatenated. All datasets must have the samekeys
.- Returns:
A concatenated dataset.
- convert(data)#
Convert fetched data.
This method takes data fetched by
fetch()
and pre-process them before passing them to models. The default behaviour is converting each column into an ndarray. This behaviour can be overridden bywith_converter()
. If the dataset is constructed byconcat()
orjoin()
, the converter of the first dataset is used.- Parameters:
data (tuple or dict) – Data from
fetch()
.- Returns:
A tuple or dict. Each value is an ndarray.
- fetch()#
Fetch data.
This method fetches all data of the dataset/view. Note that this method returns a column-major data (i.e.
([a[0], ..., a[3]], ..., [c[0], ... c[3]])
,{'a': [a[0], ..., a[3]], ..., 'c': [c[0], ..., c[3]]}
, or[a[0], ..., a[3]]
).
- get_example(i)#
- get_examples(indices, key_indices)#
Return a part of data.
- Parameters:
indices (list of ints or slice) – Indices of requested rows. If this argument is
None
, it indicates all rows.key_indices (tuple of ints) – Indices of requested columns. If this argument is
None
, it indicates all columns.
- Returns:
tuple of lists/arrays
- join(*datasets)#
Stack datasets along columns.
- Args: datasets (iterable of
TabularDataset
): Datasets to be concatenated. All datasets must have the same length
- Returns:
A joined dataset.
- Args: datasets (iterable of
- property keys#
Names of columns.
A tuple of strings that indicate the names of columns.
- property mode#
Mode of representation.
This indicates the type of value returned by
fetch()
and__getitem__()
.tuple
,dict
, andNone
are supported.
- property slice#
Get a slice of dataset.
- Parameters:
indices (list/array of ints/bools or slice) – Requested rows.
keys (tuple of ints/strs or int or str) – Requested columns.
- Returns:
A view of specified range.
- transform(keys, transform)#
Apply a transform to each example.
The transformations are a list where each element is a tuple that holds the transformation signature and a callable that is the transformation itself.
The transformation signature is a tuple of 2 elements with the first one being the keys of the dataset that are taken as inputs. And the last one the outputs it produces for the transformation keys argument.
When multiple transformations are specified, the outputs must be disjoint or ValueError will be risen.
- Parameters:
keys (tuple of strs) – The keys of transformed examples.
transform (list of tuples) – A list where each element specifies a transformation with a tuple with the transformation signature and a callable that takes an example and returns transformed example.
mode
of transformed dataset is determined by the transformed examples.
- Returns:
A transfromed dataset.
- transform_batch(keys, transform_batch)#
Apply a transform to examples.
The transformations are a list where each element is a tuple that holds the transformation signature and a callable that is the transformation itself.
The transformation signature is a tuple of 2 elements with the first one being the keys of the dataset that are taken as inputs. And the last one the outputs it produces for the transformation keys argument.
When multiple transformations are specified, the outputs must be disjoint or ValueError will be risen.
- Parameters:
keys (tuple of strs) – The keys of transformed examples.
transform_batch (list of tuples) – A list where each element specifies a transformation with a tuple with the transformation signature and a callable that takes a batch of examples and returns a batch of transformed examples.
mode
of transformed dataset is determined by the transformed examples.
- Returns:
A transfromed dataset.