Eclipse integration for the tablesaw data frame library for java
| Home |
|---|
| Data frame editor |
| Viewing and plotting table data |
| Linking table providers and consumers |
| Xaw Scripting DSL |
Although some data manipulation may be done in the editor and views, the power of the tablesaw library is unleashed by the xaw scripting DSL. The Xaw language is basically syntactic sugar for Java provided out-of-box by Xbase, some extra table and column-oriented operators and literal syntax and extension methods for reading and writing files and linking it to the table data registry.
The scripts are translated to Java in the context of the classpath of the projects they’re within, and can be executed within the workbench so they may consume table data from or provide table data to workbench parts.
A Xaw script typically load table data from one or more files, manipulates tables and columns, derive new tables and output the result. In addition, the integration with the table registry makes it possible to use intermediate and resulting table data as the source for views.
Underlying xaw are some extensions to the tablesaw library, to make it easier to work with. There are two kinds of extensions:
TypedTable<R extends Row> - a generic subclass of tablesaw’s Table class for type-safe access to columns and rows. It declares methods for creating empty copies, appending (empty) rows, getting an Iterator for rows and selecting rows with a row Predicate. This class is used as the superclass for new table types declared in xaw scripts, with an appropriate subtclass of Row as the type argument.+ and +=.A xaw script consists of a set of import statements (similar to Java’s), followed by a xaw statement declaring the qualified name of the script (and corresponding Java class) and any number of table (type) declarations, statements and (helper) function declarations.
table declarations define new typed table classes, that gives you type-safe access to columns and rows and improves code completion. Consider the following table declaration:
table tab1 {
String name,
double age
}
will generate subclasses of TypedTable and Row with type-safe accessor methods for name and age columns and values, e.g. the getNameColumn method in the TypedTable subclass will return a StringColumn, and the getAge method in the Row subclass will return a double.
The new TypedTable subclass may be used in variable declarations using the new table instance syntax (see below).
The xaw editor is a standard editor as generated by Xtext. Upon save (and build), a Java class is generated for the script itself (subclass of XawBase), with additional Java classes generated for table types.
A notable feature is the ability to execute xaw scripts inside the workbench in the context of the enclosing project’s classpath.
First and foremost, this allows scripts to import/export tables from/to the table registry using the importTable and exportTable functions:
importTable(String tableKey) - imports and returns the tableKey tableimportTables(String... tableKey) - imports and returns a collection of the tables with the provided tableKeys
exportTable(Table table, String tableKey) - exports table as tableKeyexortTables(Table... tables) - exports tables with their names as keysNote that when executed as normal Java classes outside the workbench, these methods use the file system, rather than the table registry.
Second, the project may have library dependencies besides tablesaw, e.g. SMILE for statistics and machine learning, and these will be available.
Third, standard output is captured by the console, so simplify programming and debugging.
To allow creation of typed table instances, xaw provides the # <table-def> # syntax. <table-def> may just name a previously defined table type (see above) or inline the whole table type declaration. E.g.
var tab1 = # tab1 #
will declare a tab1 variable of type tab1 initialised to an (empty) tab1 instance. You can also inline the whole table type as follows:
val tab1 = # tab1: String name, double age #
This will both declare the tab1 type and the tab1 variable as above. The table instance creation syntax may be used anywhere an expression is allowed. If used as the initial value in variable declaration, as shown here, the type name may be omitted, in case the variable name is used.
The contents of the new table instance may be provided in two ways. If nameColumn and ageColumn columns were prepared in advance, we could populate the table as follows:
val tab1 = # String name = nameColumn, double age = ageColumn #
Alternativel, we could fill the table with specific contents, where each element is an expression of the appropriate type:
val tab1 = # String name, double age #
| "Hallvard", 52|
| "Marit", 54|
Since there are specialised columns for time, both date and time of day, corresponding literals are supported:
var day = @16-11-1966 // LocalDate variable
var time = @11:38:05 // LocalTime variable
The @ character is also used for URL literals, e.g. @"https://hallvard.github.io/etablesaw". In many cases, a simpler variant may be used, e.g. the same literal could be written as @"hallvard.github.io/etablesaw. Since the double forward slash is used for comments, it was difficult to support the full URL syntax without the quotes.
Helper methods may be defined using the def keyword. These do not see top-level script variables, since the corresponding Java methods are static.
The underlying Xbase expression language supports operator overloading and custom operators. In addition to the standard Xbase operators like +, -, +=, -=, >> and <<, we have added & for evaluating a predicate, and &, |, &= and |= for handling (row) selections.
Here’s a list of all the overloaded operators, grouped on operand type(s):
col1 += col2 - appends col2 to col1col += item - appends item to colcol1 += col2 -> row - appends the value at row in col2 to col (-> creates a Pair).col += string - appends string to col with appendCellcol - selection - creates a new column from col with only the rows not in selectioncol & selection - creates a new column from col with only the rows in selectioncol & intRange - creates a new column from col with only the rows in intRange (int ranges can be created with .. and ..< operators)col & predicate - creates a new column from col with only the rows selected by predicatecol ? predicate - creates a selection with the rows satisfying predicate
table += row - appends row to table, works for tables and rows with upto 6 columnstable - selection - creates a new table from table with only the rows not in selectiontable & selection - creates a new table from table with only the rows in selectiontable & intRange - creates a new table from table with only the rows in intRangetable - intRange - creates a new table from table with only the rows not in intRangetable & predicate - creates a new table from table with only the rows selected by predicatetable1 => table2 - appends the columns in table1 to corresponding columns in table2 (if they exist)
! sel - creates a new selection that includes all rows not in selsel1 &= sel2 - removes rows from sel1 that are not in sel2sel1 |= sel2 - adds rows to sel1 that are in sel2sel1 -= sel2 - removes rows from sel1 that are in sel2sel += row - adds row to selsel -= row - removes row from selsel += intIterable - adds all rows in intIterable to selsel -= intIterable - removes all rows in intIterable from selsel += intRange - adds all rows in intRange to selsel -= intRange - removes all rows in intRange from sel
numCol > n - creates a selection with the rows in numCol that have values greater than nnumCol1 > numCol2 - creates a selection with the rows in numCol1 that have values greater than corresponding rows in numCol2same as the two above for operators >=, < and <=
dateCol > date - creates a selection with the rows in dateCol that are after dateTimedateCol > dateInt - creates a selection with the rows in dateCol that are after dateInt (date encoded as an int)dateCol1 > dateCol2 - creates a selection with the rows in dateCol1 that are after corresponding values in dateCol2same as the three above for operator < (is before)
dateTimeCol > dateTime - creates a selection with the rows in dateTimeCol that are after dateTimedateTimeCol > date - creates a selection with the rows in dateTimeCol that are after datedateTimeCol1 > dateTimeCol2 - creates a selection with the rows in dateTimeCol1 that are after corresponding values in dateTimeCol2same as the three above for operator < (is before)
timeCol > time - creates a selection with the rows in timeCol that are after timetimeCol > timeInt - creates a selection with the rows in timeCol that are after timeInt (time encoded as an int)timeCol1 > timeCol2 - creates a selection with the rows in timeCol1 that are after corresponding values in timeCol2same as the three above for operator < (is before)
doubleIterator => doubleCol - fills doubleCol with the values in doubleIteratordoubleSupplier => doubleCol - fills doubleCol with the values in doubleSupplierdoubleRangeIterable => doubleCol - fills doubleCol with the values in doubleRangeIterable
doubleCol + n - creates a new double column with n added to values from doubleColdoubleCol + numCol - creates a new double column with elements from numCol added to corresponding values from doubleCol-, * and /doubleCol ^ n - creates a new double column with elements from doubleCol raised to the power of n
localDate + localTime - creates a new LocalDateTime with coordinates from localDate and localTimelocalTime + localDate - creates a new LocalDateTime with coordinates from localDate and localTimelocalDateTime + localTime - creates a new LocalDateTime adjusted forward localDateTime according to localTimelocalDateTime - localTime - creates a new LocalDateTime adjusted backward localDateTime according to localTime
dateTimeIterator => dateTimeCol - fills dateTimeCol with the values in dateTimeIteratordateTimeSupplier => dateTimeCol - fills dateTimeCol with the values in dateTimeSupplierdateTimeIterable => dateTimeCol - fills dateTimeCol with the values in dateTimeIterableLocalDate