Eclipse integration for the tablesaw data frame library for java
Home |
---|
Data frame editor |
Viewing and plotting table data |
Linking table providers and consumers |
Xaw Scripting DSL |
Although some data manipulation may be done in the editor and views, the power of the tablesaw library is unleashed by the xaw scripting DSL. The Xaw language is basically syntactic sugar for Java provided out-of-box by Xbase, some extra table and column-oriented operators and literal syntax and extension methods for reading and writing files and linking it to the table data registry.
The scripts are translated to Java in the context of the classpath of the projects they’re within, and can be executed within the workbench so they may consume table data from or provide table data to workbench parts.
A Xaw script typically load table data from one or more files, manipulates tables and columns, derive new tables and output the result. In addition, the integration with the table registry makes it possible to use intermediate and resulting table data as the source for views.
Underlying xaw are some extensions to the tablesaw library, to make it easier to work with. There are two kinds of extensions:
TypedTable<R extends Row>
- a generic subclass of tablesaw’s Table
class for type-safe access to columns and rows. It declares methods for creating empty copies, appending (empty) rows, getting an Iterator
for rows and selecting rows with a row Predicate
. This class is used as the superclass for new table types declared in xaw scripts, with an appropriate subtclass of Row
as the type argument.+
and +=
.A xaw script consists of a set of import
statements (similar to Java’s), followed by a xaw
statement declaring the qualified name of the script (and corresponding Java class) and any number of table (type) declarations, statements and (helper) function declarations.
table
declarations define new typed table classes, that gives you type-safe access to columns and rows and improves code completion. Consider the following table
declaration:
table tab1 {
String name,
double age
}
will generate subclasses of TypedTable
and Row
with type-safe accessor methods for name
and age
columns and values, e.g. the getNameColumn
method in the TypedTable
subclass will return a StringColumn
, and the getAge
method in the Row
subclass will return a double
.
The new TypedTable
subclass may be used in variable declarations using the new table instance syntax (see below).
The xaw editor is a standard editor as generated by Xtext. Upon save (and build), a Java class is generated for the script itself (subclass of XawBase
), with additional Java classes generated for table
types.
A notable feature is the ability to execute xaw scripts inside the workbench in the context of the enclosing project’s classpath.
First and foremost, this allows scripts to import/export tables from/to the table registry using the importTable
and exportTable
functions:
importTable(String tableKey)
- imports and returns the tableKey
tableimportTables(String... tableKey)
- imports and returns a collection of the tables with the provided tableKeys
exportTable(Table table, String tableKey)
- exports table
as tableKey
exortTables(Table... tables)
- exports tables
with their names as keysNote that when executed as normal Java classes outside the workbench, these methods use the file system, rather than the table registry.
Second, the project may have library dependencies besides tablesaw, e.g. SMILE for statistics and machine learning, and these will be available.
Third, standard output is captured by the console, so simplify programming and debugging.
To allow creation of typed table instances, xaw provides the # <table-def>
# syntax. <table-def>
may just name a previously defined table type (see above) or inline the whole table
type declaration. E.g.
var tab1 = # tab1 #
will declare a tab1
variable of type tab1
initialised to an (empty) tab1
instance. You can also inline the whole table
type as follows:
val tab1 = # tab1: String name, double age #
This will both declare the tab1
type and the tab1
variable as above. The table instance creation syntax may be used anywhere an expression is allowed. If used as the initial value in variable declaration, as shown here, the type name may be omitted, in case the variable name is used.
The contents of the new table instance may be provided in two ways. If nameColumn
and ageColumn
columns were prepared in advance, we could populate the table as follows:
val tab1 = # String name = nameColumn, double age = ageColumn #
Alternativel, we could fill the table with specific contents, where each element is an expression of the appropriate type:
val tab1 = # String name, double age #
| "Hallvard", 52|
| "Marit", 54|
Since there are specialised columns for time, both date and time of day, corresponding literals are supported:
var day = @16-11-1966 // LocalDate variable
var time = @11:38:05 // LocalTime variable
The @
character is also used for URL literals, e.g. @"https://hallvard.github.io/etablesaw"
. In many cases, a simpler variant may be used, e.g. the same literal could be written as @"hallvard.github.io/etablesaw
. Since the double forward slash is used for comments, it was difficult to support the full URL syntax without the quotes.
Helper methods may be defined using the def
keyword. These do not see top-level script variables, since the corresponding Java methods are static
.
The underlying Xbase expression language supports operator overloading and custom operators. In addition to the standard Xbase operators like +
, -
, +=
, -=
, >>
and <<
, we have added &
for evaluating a predicate, and &
, |
, &=
and |=
for handling (row) selections.
Here’s a list of all the overloaded operators, grouped on operand type(s):
col1 += col2
- appends col2
to col1
col += item
- appends item
to col
col1 += col2 -> row
- appends the value at row
in col2
to col
(->
creates a Pair
).col += string
- appends string
to col
with appendCell
col - selection
- creates a new column from col
with only the rows not in selection
col & selection
- creates a new column from col
with only the rows in selection
col & intRange
- creates a new column from col
with only the rows in intRange
(int ranges can be created with ..
and ..<
operators)col & predicate
- creates a new column from col
with only the rows selected by predicate
col ? predicate
- creates a selection with the rows satisfying predicate
table += row
- appends row
to table
, works for tables and rows with upto 6 columnstable - selection
- creates a new table from table
with only the rows not in selection
table & selection
- creates a new table from table
with only the rows in selection
table & intRange
- creates a new table from table
with only the rows in intRange
table - intRange
- creates a new table from table
with only the rows not in intRange
table & predicate
- creates a new table from table
with only the rows selected by predicate
table1 => table2
- appends the columns in table1
to corresponding columns in table2
(if they exist)
! sel
- creates a new selection that includes all rows not in sel
sel1 &= sel2
- removes rows from sel1
that are not in sel2
sel1 |= sel2
- adds rows to sel1
that are in sel2
sel1 -= sel2
- removes rows from sel1
that are in sel2
sel += row
- adds row
to sel
sel -= row
- removes row
from sel
sel += intIterable
- adds all rows in intIterable
to sel
sel -= intIterable
- removes all rows in intIterable
from sel
sel += intRange
- adds all rows in intRange
to sel
sel -= intRange
- removes all rows in intRange
from sel
numCol > n
- creates a selection with the rows in numCol
that have values greater than n
numCol1 > numCol2
- creates a selection with the rows in numCol1
that have values greater than corresponding rows in numCol2
same as the two above for operators >=
, <
and <=
dateCol > date
- creates a selection with the rows in dateCol
that are after dateTime
dateCol > dateInt
- creates a selection with the rows in dateCol
that are after dateInt
(date encoded as an int
)dateCol1 > dateCol2
- creates a selection with the rows in dateCol1
that are after corresponding values in dateCol2
same as the three above for operator <
(is before)
dateTimeCol > dateTime
- creates a selection with the rows in dateTimeCol
that are after dateTime
dateTimeCol > date
- creates a selection with the rows in dateTimeCol
that are after date
dateTimeCol1 > dateTimeCol2
- creates a selection with the rows in dateTimeCol1
that are after corresponding values in dateTimeCol2
same as the three above for operator <
(is before)
timeCol > time
- creates a selection with the rows in timeCol
that are after time
timeCol > timeInt
- creates a selection with the rows in timeCol
that are after timeInt
(time encoded as an int
)timeCol1 > timeCol2
- creates a selection with the rows in timeCol1
that are after corresponding values in timeCol2
same as the three above for operator <
(is before)
doubleIterator => doubleCol
- fills doubleCol
with the values in doubleIterator
doubleSupplier => doubleCol
- fills doubleCol
with the values in doubleSupplier
doubleRangeIterable => doubleCol
- fills doubleCol
with the values in doubleRangeIterable
doubleCol + n
- creates a new double column with n
added to values from doubleCol
doubleCol + numCol
- creates a new double column with elements from numCol
added to corresponding values from doubleCol
-
, *
and /
doubleCol ^ n
- creates a new double column with elements from doubleCol
raised to the power of n
localDate + localTime
- creates a new LocalDateTime
with coordinates from localDate
and localTime
localTime + localDate
- creates a new LocalDateTime
with coordinates from localDate
and localTime
localDateTime + localTime
- creates a new LocalDateTime
adjusted forward localDateTime
according to localTime
localDateTime - localTime
- creates a new LocalDateTime
adjusted backward localDateTime
according to localTime
dateTimeIterator => dateTimeCol
- fills dateTimeCol
with the values in dateTimeIterator
dateTimeSupplier => dateTimeCol
- fills dateTimeCol
with the values in dateTimeSupplier
dateTimeIterable => dateTimeCol
- fills dateTimeCol
with the values in dateTimeIterable
LocalDate