Org-mode > orgtbl-aggregate
Create aggregated table from source table
-- mode: org; coding:utf-8; --
#+TITLE: Aggregate Values in a Table #+OPTIONS: ^:{} authors:Thierry Banel, Michael Brand toc:nil
Aggregating a table is creating a new table by computing sums, averages, and so on, out of material from the first table.
- New Transpose-babel blocks now handle =@#= and =hline= special columns. =@#= is the input table row number. =hline= is the block number between two horizontal lines where the current row is located.
And by the way, yes, =orgtbl-aggregate= comes with =orgtbl-transpose= as a bonus. It flips rows with columns.
- Table of Contents :PROPERTIES: :TOC: :include siblings :depth 2 :force () :ignore (this) :local (nothing) :CUSTOM_ID: table-of-contents :END:
:CONTENTS:
- [[#examples][Examples]]
- [[#a-very-simple-example][A very simple example]]
- [[#demonstrate-sum-and-average-computing][Demonstrate sum and average computing]]
- [[#example-without-days][Example without days]]
- [[#example-of-counting-each-combination][Example of counting each combination]]
- [[#stop-reading-here-8020][Stop reading here! 80/20]]
- [[#name-your-input-table][Name your input table]]
- [[#create-an-aggregation-block][Create an aggregation block]]
- [[#refresh-the-aggregation][Refresh the aggregation]]
- [[#equivalent-in-sql-r-datamash-el-tblfn-awk-c][Equivalent in SQL, R, Datamash, el-tblfn, Awk, C++]]
- [[#sql-equivalent][SQL equivalent]]
- [[#r-equivalent][R equivalent]]
- [[#datamash-equivalent][Datamash equivalent]]
- [[#el-tblfn][el-tblfn]]
- [[#awk-equivalent][Awk equivalent]]
- [[#c-equivalent][C++ equivalent]]
- [[#wizards][Wizards]]
- [[#guiding-traditional-wizard][Guiding (traditional) wizard]]
- [[#experimental-free-form-wizard][Experimental free form wizard]]
- [[#the-cols-parameter][The :cols parameter]]
- [[#names-of-input-columns][Names of input columns]]
- [[#grouping-specifications-in-cols][Grouping specifications in :cols]]
- [[#the-hline-column][The hline column]]
- [[#the--column][The @# column]]
- [[#aggregation-formulas-in-cols][Aggregation formulas in :cols]]
- [[#correlation-of-two-columns][Correlation of two columns]]
- [[#almost-any-expression-can-be-specified][(Almost) any expression can be specified]]
- [[#column-names][Column names]]
- [[#input-table-with-or-without-a-header][Input table with or without a header]]
- [[#column-names-of-the-input-table][Column names of the input table]]
- [[#multiple-lines-header][Multiple lines header]]
- [[#custom-column-names][Custom column names]]
- [[#formatters][Formatters]]
- [[#org-mode-compatible-formatters][Org Mode compatible formatters]]
- [[#debugging-formatters][Debugging formatters]]
- [[#discarding-an-output-column][Discarding an output column]]
- [[#sorting][Sorting]]
- [[#example-with-one-sorting-column][Example with one sorting column]]
- [[#several-sorting-columns][Several sorting columns]]
- [[#hlines-in-the-output-table][hlines in the output table]]
- [[#output-hlines-depends-on-sorting-columns][Output hlines depends on sorting columns]]
- [[#example-with-hline-2][Example with hline 2]]
- [[#cells-processing][Cells processing]]
- [[#where-calc-interpretation-happens][Where Calc interpretation happens?]]
- [[#dates][Dates]]
- [[#durations][Durations]]
- [[#empty-and-malformed-input-cells][Empty and malformed input cells]]
- [[#symbolic-computation][Symbolic computation]]
- [[#intervals][Intervals]]
- [[#error-or-precision-forms][Error or precision forms]]
- [[#wide-variety-of-inputs][Wide variety of inputs]]
- [[#standard-org-mode-input][Standard Org Mode input]]
- [[#virtual-input-table-from-babel][Virtual input table from Babel]]
- [[#an-org-id][An Org ID]]
- [[#csv-input][CSV input]]
- [[#json-input][JSON input]]
- [[#input-slicing][Input slicing]]
- [[#the-cond-filter][The :cond filter]]
- [[#virtual-input-columns][Virtual input columns]]
- [[#post-processing][Post-processing]]
- [[#spreadsheet-formulas][Spreadsheet formulas]]
- [[#algorithm-post-processing][Algorithm post processing]]
- [[#grand-total][Grand total]]
- [[#chaining][Chaining]]
- [[#pull--push][Pull & Push]]
- [[#pull-mode][Pull mode]]
- [[#push-mode][Push mode]]
- [[#pull-or-push-][Pull or push ?]]
- [[#debugging][Debugging]]
- [[#seeing-the--forms][Seeing the $ forms]]
- [[#seeing-calc-formulas-before-evaluation][Seeing Calc formulas before evaluation]]
- [[#seeing-lisp-internal-form-of-calc-formulas][Seeing Lisp internal form of Calc formulas]]
- [[#example-of-debugging-vsumnn2][Example of debugging vsum(nn^2)]]
- [[#summary-of-debugging-formatters][Summary of debugging formatters]]
- [[#tricks][Tricks]]
- [[#sorting-0][Sorting]]
- [[#a-few-lowest-or-highest-values][A few lowest or highest values]]
- [[#span-of-values][Span of values]]
- [[#no-aggregation][No aggregation]]
- [[#installation][Installation]]
- [[#authors-contributors][Authors, contributors]]
- [[#changes][Changes]]
- [[#gpl-3-license][GPL 3 License]] :END:
- Examples :PROPERTIES: :CUSTOM_ID: examples :END:
** A very simple example :PROPERTIES: :CUSTOM_ID: a-very-simple-example :END:
We have a table of activities and quantities (whatever they are) over several days.
#+begin_example #+name: original | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+end_example
To begin with we want to gather all colors and count how many times they appear. We are interested only in the second column named =Color=
First we give a name to the table through the =#+NAME:= or =#+TBLNAME:= tags, just above the table. Then we create a /dynamic block/ to receive the aggregation:
#+begin_example desired output columns╶──────────────────────────╮ the input table╶──────────────╮ │ type of processing╶─╮ │ │ ╭───────╯ │ │ ▼ ╭───┴────╮ ╭─────┴───────╮ #+begin: aggregate :table "original" :cols "Color count()" #+end: #+end_example
Now typing =C-c C-c= in the dynamic block counts the colors in the original table:
#+begin_example C-c C-c here to refresh╶╮ ╭──────────╯ ▼ #+begin: aggregate :table "original" :cols "Color count()" | Color | count() | |-------+---------| | Red | 7 | | Blue | 7 | #+end: #+end_example
OrgAggregate found two colors, =Red= and =Blue=. It found 7 occurrences for each.
** Demonstrate sum and average computing :PROPERTIES: :CUSTOM_ID: demonstrate-sum-and-average-computing :END:
Now we want to aggregate this table for each day (because several rows exist for each day). We want the average value of the =Level= column for each day, and the sum of the =Quantity= column. We write down the block specifying that (later we will see how to automate the creation of such a block with a [[#wizards][wizard]]):
#+begin_example sum aggregation╶─────────────────────────────────────────────────╮ average aggregation╶───────────────────────────────╮ │ key grouping column╶───────────────────────╮ │ │ ╭┴╮ ╭────┴─────╮ ╭─────┴──────╮ #+begin: aggregate :table original :cols "Day vmean(Level) vsum(Quantity)" #+end #+end_example
Typing =C-c C-c= in the dynamic block computes the aggregation:
#+begin_example C-c C-c here to refresh╶╮ ╭──────────╯ ▼ #+begin: aggregate :table original :cols "Day vmean(Level) vsum(Quantity)" ╰┬╯ ╰────┬─────╯ ╰──────┬─────╯ ╭───────────────────────────────────────╯ │ │ │ ╭───────────────────────────────╯ │ │ │ ╭──────────────────────────────╯ ╭┴╮ ╭────┴─────╮ ╭─────┴──────╮ | Day | vmean(Level) | vsum(Quantity) | |-----------+--------------+----------------| | Monday | 27.5 | 14 | | Tuesday | 43 | 45 | | Wednesday | 18 | 54 | | Thursday | 43 | 83 | | Friday | 8 | 22 | #+end #+end_example
The source table is not changed in any way.
To get this result, we specified columns in this way, after the =:cols= parameter:
=Day= : we got the same column as in the source table, except entries are not duplicated. Here =Day= acts as a /key grouping column/. We may specify as many key columns as we want just by naming them. We get only one aggregated row for each different combination of values of key grouping columns.
=vmean(Level)= : this instructs OrgAggregate to compute the average of values found in the =Level= column, grouped by the same =Day=.
=vsum(Quantity)=: OrgAggregate computes the sum of values found in the =Quantity= column, one sum for each =Day=.
** Example without days :PROPERTIES: :CUSTOM_ID: example-without-days :END:
Maybe we are just interested in the sum of =Quantities=, regardless of =Days=. We just type:
#+begin_example C-c C-c here to refresh╶╮ ╭──────────╯ ▼ #+begin: aggregate :table "original" :cols "vsum(Quantity)" ╰─────┬──────╯ ╭──────────────────────────────────────────╯ ╭────┴───────╮
| vsum(Quantity) |
|---|
| 218 |
| #+end |
| #+end_example |
** Example of counting each combination :PROPERTIES: :CUSTOM_ID: example-of-counting-each-combination :END:
we may want to count the number of rows for each combination of =Day= and =Color=:
#+BEGIN_EXAMPLE C-c C-c here to refresh╶╮ ╭──────────╯ ▼ #+BEGIN: aggregate :table "original" :cols "count() Day Color" ╰──┬──╯ ╰┬╯ ╰─┬─╯ ╭─────────────────────────────────────────╯ │ │ │ ╭───────────────────────────────────────╯ │ │ │ ╭───────────────────────────────╯ ╭──┴──╮ ╭┴╮ ╭─┴─╮ | count() | Day | Color | |---------+-----------+-------| | 1 | Monday | Red | | 1 | Monday | Blue | | 2 | Tuesday | Red | | 1 | Tuesday | Blue | | 1 | Wednesday | Red | | 2 | Wednesday | Blue | | 3 | Thursday | Red | | 3 | Friday | Blue | #+END #+END_EXAMPLE
If we want to get measurements for =Colors= rather than =Days=, we type:
#+begin_example C-c C-c here to refresh╶╮ ╭──────────╯ ▼ #+begin: aggregate :table "original" :cols "Color vmean(Level) vsum(Quantity)" ╰─┬─╯ ╰────┬─────╯ ╰─────┬──────╯ ╭─────────────────────────────────────────╯ │ │ │ ╭──────────────────────────────────────╯ │ │ │ ╭────────────────────────────────────╯ ╭─┴─╮ ╭────┴─────╮ ╭─────┴──────╮ | Color | vmean(Level) | vsum(Quantity) | |-------+---------------+----------------| | Red | 40.2857142857 | 144 | | Blue | 15.5714285714 | 74 | #+end #+end_example
- Stop reading here! 80/20 :PROPERTIES: :CUSTOM_ID: stop-reading-here-8020 :END:
If you managed to get here, you are at 80/20 (thanks Pareto!). You grasped only 20% of the OrgAggregate features, but those 20% cover 80% of the use cases.
To summarize the 20%:
** Name your input table :PROPERTIES: :CUSTOM_ID: name-your-input-table :END:
Select one of your Org table, and be ready to aggregate values from it right in the same file.
Give a name to your table with a special line just above it.
#+begin_example name here╶───╮ ╭────╯ ▼ #+name: original | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | … #+end_example
** Create an aggregation block :PROPERTIES: :CUSTOM_ID: create-an-aggregation-block :END:
#+begin_example #+begin: aggregate #+end: #+end_example
- Input: specify a =:table= parameter.
- Output: specify the desired output columns with the =:cols= parameter.
#+begin_example output╶───────────────────────────────────────────╮ input╶───────────────────────╮ │ │ │ ╭──┴───╮ ╭────────┴─────────╮ #+begin: aggregate :table original :cols "Day vsum(Quantity)" #+end: #+end_example
** Refresh the aggregation :PROPERTIES: :CUSTOM_ID: refresh-the-aggregation :END:
- Type =C-c C-c= on the =#+begin:= line now and whenever you want to refresh the aggregation.
#+begin_example C-c C-c here╶─╮ │ ▼ #+begin: aggregate :table original :cols "Day vsum(Quantity)" | Day | vsum(Quantity) | |-----------+----------------| | Monday | 14 | | Tuesday | 45 | … #+end: #+end_example
- Equivalent in SQL, R, Datamash, el-tblfn, Awk, C++ :PROPERTIES: :CUSTOM_ID: equivalent-in-sql-r-datamash-el-tblfn-awk-c :END:
Aggregation is a widely used method to get insights in tabular data. Use whatever environment best suits your needs.
OrgAggregate is great when you want to output and Org Mode table. Also, OrgAggregate has no dependency other than Emacs, not even other Lisp packages.
** SQL equivalent :PROPERTIES: :CUSTOM_ID: sql-equivalent :END:
If you are familiar with SQL, you would get a similar result with the =GROUP BY= statement:
#+begin_src sql select Day, mean(Level), sum(Quantity) from original group by Day; #+end_src
** R equivalent :PROPERTIES: :CUSTOM_ID: r-equivalent :END:
If you are familiar with the R statistical language, you would get a similar result with =factor= and =aggregate= functions:
#+begin_src R original <- the table as a data.frame day_factor <- factor(original$Day) aggregate (original$Level , list(Day=day_factor), mean) aggregate (original$Quantity, list(Day=day_factor), sum ) #+end_src
** Datamash equivalent :PROPERTIES: :CUSTOM_ID: datamash-equivalent :END:
The command-line Datamash software operates on CSV files and can achieve a similar result:
#+begin_src shell datamash -H -g Day mean Level sum Quantity <original.csv GroupBy(Day) mean(Level) sum(Quantity) Monday 27.5 14 Tuesday 43 45 Wednesday 18 54 Thursday 43 83 Friday 8 22 #+end_src
** el-tblfn :PROPERTIES: :CUSTOM_ID: el-tblfn :END: The Misohena's el-tblfn package aggregates Org Mode tables through Lisp scripts. See https://github.com/misohena/el-tblfn. Example:
#+begin_example ,#+begin_src elisp :var original=original :colnames no :hlines yes (require 'tblfn) (thread-first (tblfn-aggregate original "Quantity" "Total")) ,#+end_src #+end_example
It is not clear how el-tblfn can compute the =mean= (or other aggregating functions) on the =Level= column. Would the author want to complete the example?
** Awk equivalent :PROPERTIES: :CUSTOM_ID: awk-equivalent :END: Awk is a line-oriented filter which has been arround in Unix for decades. Here we use the standard Org Mode support for Awk.
#+begin_example #+begin_src awk :stdin original NR>1 { Day =$1 Color =$2 Level =$3 Quantity =$4 SumLevel[Day] += Level SumQuantity[Day] += Quantity Count[Day] ++ } END { for (d in SumQuantity) { printf "%s %s %s\n", d, SumLevel[d]/Count[d], SumQuantity[d] } } #+end_src #+end_example
** C++ equivalent :PROPERTIES: :CUSTOM_ID: c-equivalent :END: C++ has hash-maps in its standard template library. And Org Mode provides support for C++ Babel blocks. Thus, it is quite straigthforward to aggegrate in this language.
(Don't forget to customize =org-src-lang-modes= to activate C++ support in Org Mode).
#+begin_example #+begin_src C++ :var original=original :includes '( ) using namespace std; unordered_map<string,double> SumLevel; unordered_map<string,double> SumQuantity; unordered_map<string,double> Count; for (auto row : original) { auto Day = row[0]; auto Color = row[1]; auto Level = stod(row[2]); auto Quantity = stod(row[3]); SumLevel[Day] += Level; SumQuantity[Day] += Quantity; Count[Day] ++; } for (auto it : SumQuantity) cout<<it.first<<" " <<SumLevel[it.first]/Count[it.first]<<" " <<SumQuantity[it.first]<<"\n"; #+end_src #+end_example
- Wizards :PROPERTIES: :CUSTOM_ID: wizards :END:
There are now 2 wizards. Use whichever is easier for you.
** Guiding (traditional) wizard :PROPERTIES: :CUSTOM_ID: guiding-traditional-wizard :END:
Type =C-c C-x x= to launch a wizard for creating new dynamic blocks, or amending existing ones. Then answer =aggregate= for the type of block, and follow the instructions. (There are several other /dynamic blocks/ that can be built this way: =columnview=, =clocktable=, =propview=, =invoice=, =transpose=, and any future block).
There are two modes:
- basic, =C-c C-x x=, =M-x orgtbl-aggregate-insert-dblock-aggregate=
- expert, =C-u C-c C-x x=, =C-u M-x orgtbl-aggregate-insert-dblock-aggregate=
The basic mode only queries for an input table in the current buffer and the output columns. Of course, if the wizard updates an existing block with exotic parameters, those parameters will be queried even in basic mode.
The expert mode queries every parameter.
The wizard has been enhanced & extended.
It can update an existing block. Put the cursor on a line beginning with =#+begin: aggregate …= and type =C-c C-x x=. The wizard will use values picked from this line as defaults for user queries.
The wizard now takes into account the multiple forms of input tables. Tables may lie in distant files as well as in the current buffer. They may be computed on the fly by Babel blocks as well as regular Org Mode tables. They may also be pointed to by Org IDs.
Regular Org Mode table slicing is now recognized.
Pre & post processors are now handled by the wizard.
The =:hline= parameter is taken into account.
Most of the wizard's queries may be answered by just hitting RETURN. This is because many parameters are optional. And because when amending an existing block, the wizard suggests the old values as the default answers.
When invoking the wizard, a small window opens to display help relative to each field the wizard queries. This help window is in Org Mode for an easy reading. The window will be closed when the wizard is done or when aborting.
** Experimental free form wizard :PROPERTIES: :CUSTOM_ID: experimental-free-form-wizard :END:
Suppose we have an aggregate block like this one:
#+begin_example the 2 main parameters╶────────╮────────────────────╮ │ │ ╭───┴────╮ ╭───────┴───────╮ #+begin: aggregate :table "my_table" :cols "vsum(b) count()" :hline "1" #+end: #+end_example
Then, hitting the =TAB= key anywhere on the =#+begin:= line (but not at the beginning) unfolds it. The result is this one:
#+begin_example #+begin: aggregate ╭─╴the 2 main parameters #+aggregate: :file │ #+aggregate: :name my_table ◀─╯ #+aggregate: :orgid │ #+aggregate: :params │ #+aggregate: :slice │ #+aggregate: :precompute │ #+aggregate: :cols vsum(b) count() ◀─╯ #+aggregate: :cond #+aggregate: :hline 1 #+aggregate: :post #+end: #+end_example
This unfolded form is easier to read. Furthermore, help and completion is available on each line, by hitting the =TAB= key.
Hit =TAB= again on the =#+begin:= line (not at the beginning) to fold back the form. Then hit =C-c C-c= as usual to refresh the aggregation.
The usual behaviour of Org Mode =TAB= is preserved when the =TAB= key is pressed at the beginning of line.
All fields recognized by OrgAggregate are displayed, even when they are empty. The empty fields will be ignored when folding back. This allows to grasp all the parameters in a single eyesight.
All in all, this new wizard covers the same features as the traditional, guiding wizard. Except, one can fill the fields in any order, and just ignore the unnecessary ones.
- The :cols parameter :PROPERTIES: :CUSTOM_ID: the-cols-parameter :END:
The =:cols= parameter lists the columns of the resulting table. It contains in any order, grouping key columns and aggregation formulas. The choosen order will be reflected in the output table.
** Names of input columns :PROPERTIES: :CUSTOM_ID: names-of-input-columns :END:
The names of the columns in the original table may be:
- the names as they appear in the header of the source table,
- or =$1=, =$2=, =$3= and so on (as in spreadsheet formulas),
- additionally, the special column =hline= is used to group parts of the source table separated by horizontal lines.
- the special =@#= column, not in the input table, is the row number.
The =:cols= parameter may be a string or a list of strings. Examples:
#+begin_example :cols "Day vmean(Level);f3 vsum(Quantity);f2" :cols ("Day" "vmean(Level);f3" "vsum(Quantity);f2") #+end_example
If a single string is used, it is split by spaces. Thus, a given formula, including its semicolon and modifiers, must not contain any space. If spaces are required within a formula, then use the parenthesis list. If a column name has spaces, quote it like this:
#+begin_example 'yellow submarine' #+end_example
** Grouping specifications in :cols :PROPERTIES: :CUSTOM_ID: grouping-specifications-in-cols :END:
Grouping is done on columns of the source table acting as key columns. Just name the key columns.
Additionally, the =hline= specification means that rows between two horizontal lines should be grouped.
Key columns and =hline= are used to group rows of the source table with unique combinations of those columns.
** The hline column :PROPERTIES: :CUSTOM_ID: the-hline-column :END:
hline = "horizontal line"
The special column named =hline= gives the block number in the source table. A block is the rows separated by horizontal lines in the input table. The first block is numbered =0=. It is a virtual column in the sense that it is not a column in the input table. Other than that, it is usable as any real column.
Here is a source table containing 3 blocks separated by horizontal lines:
#+begin_example #+name: originalhl | Color | Level | Quantity | |-------+-------+----------| ╶╮ | Red | 30 | 11 | │ | Blue | 25 | 3 | ├─first block, n° 0 | Red | 51 | 12 | │ | Red | 45 | 15 | │ | Blue | 33 | 18 | ╶╯ |-------+-------+----------| ╶╮ | Red | 27 | 23 | │ | Blue | 12 | 16 | ├─second block, n° 1 | Blue | 15 | 15 | │ | Red | 39 | 24 | │ | Red | 41 | 29 | ╶╯ |-------+-------+----------| ╶╮ | Red | 49 | 30 | │ | Blue | 7 | 5 | ├─third block, n° 2 | Blue | 6 | 8 | │ | Blue | 11 | 9 | ╶╯ #+end_example
And here is the aggregation by those 3 blocks:
#+begin_example #+BEGIN: aggregate :table originalhl :cols "hline vmean(Level) vsum(Quantity)" | hline | vmean(Level) | vsum(Quantity) | |-------+--------------+----------------| | 0 | 36.8 | 59 | ◀─╴first block , n° 0 | 1 | 26.8 | 107 | ◀─╴second block, n° 1 | 2 | 18.25 | 52 | ◀─╴third block , n° 2 #+END: #+end_example
If we want additional details with the =Color= column, we just name it:
#+begin_example #+begin: aggregate :table originalhl :cols "hline Color vmean(Level) vsum(Quantity)" | hline | Color | vmean(Level) | vsum(Quantity) | |-------+-------+---------------+----------------| | 0 | Red | 42 | 38 | ╶┬──╴first block , n° 0 | 0 | Blue | 29 | 21 | ╶╯ | 1 | Red | 35.6666666667 | 76 | ╶┬──╴second block, n° 1 | 1 | Blue | 13.5 | 31 | ╶╯ | 2 | Red | 49 | 30 | ╶┬──╴third block , n° 2 | 2 | Blue | 8 | 22 | ╶╯ #+end: #+end_example
There is an ugly value, =35.6666666667=, in the middle of the table. See later how to format it ([[#org-mode-compatible-formatters][Org Mode compatible formatters]]).
** The @# column :PROPERTIES: :CUSTOM_ID: the--column :END:
The special column named =@#= gives the row number in the source table. The =@#= name is the same as in the Org table spreadsheet formulas. It is a virtual column in the sense that it does not appear in the input table. Other than that, it is usable as any real column.
Example: we compute the average of the row number for colors =Red= & =Blue=. This gives a sense of whether each colors appears rather at the beginning or the end of the table.
#+begin_example #+begin: aggregate :table "original" :cols "Color vmean(@#)" | Color | vmean(@#) | |-------+-----------| | Red | 6.2857143 | | Blue | 8.7142857 | #+end: #+end_example
Example: we recover the input table in reverse order. We use =@#= as a key column sorted in decreasing order with =;^N=. Then we make it invisible with =;<>= so that it does not appear in the output.
#+begin_example invisible╶─────────────────────────────────────────╮ sorted numerically decreasing╶───────────────────╮ │ row numbers of input table╶──────────────────╮ │ │ │ │ │ ▼ ▼ ▼ #+begin: aggregate :table "original" :cols "Day Color Level Quantity @#;^N;<>" | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Friday | Blue | 11 | 9 | | Friday | Blue | 6 | 8 | | Friday | Blue | 7 | 5 | | Thursday | Red | 49 | 30 | | Thursday | Red | 41 | 29 | | Thursday | Red | 39 | 24 | | Wednesday | Blue | 15 | 15 | | Wednesday | Blue | 12 | 16 | | Wednesday | Red | 27 | 23 | | Tuesday | Blue | 33 | 18 | | Tuesday | Red | 45 | 15 | | Tuesday | Red | 51 | 12 | | Monday | Blue | 25 | 3 | | Monday | Red | 30 | 11 | #+end: #+end_example
** Aggregation formulas in :cols :PROPERTIES: :CUSTOM_ID: aggregation-formulas-in-cols :END:
Aggregation formulas are applied for each of the groupings, on the specified columns.
We saw examples with =sum=, =mean=, =count= aggregations. There are many other aggregations. They are based on functions provided by Calc (Calc is the powerful Emacs calculator):
=count()= or =vcount()=
- in Calc: =
u #' (calc-vector-count') [`vcount'])= - gives the number of elements in the group being aggregated; this function may or may not take a column parameter; with a parameter, empty cells are not counted (except with the =E= modifier)..
- in Calc: =
=sum(X)= or =vsum(X)=
- in Calc: =
u +' (calc-vector-sum') [`vsum']= - computes the sum of elements being aggregated
- in Calc: =
=cnorm(X)=
- in Calc: =
v N' (calc-cnorm') [cnorm']= - like =vsum(X)=, compute the sum of values, but first replacing negative values by their opposite
- in Calc: =
=max(X)= or =vmax(X)=
- in Calc: =
u X' (calc-vector-max') [`vmax']= - gives the largest of the elements being aggregated
- in Calc: =
=min(X)= or =vmin(X)=
- in Calc: =
u N' (calc-vector-min') [`vmin']= - gives the smallest of the elements being aggregated
- in Calc: =
=span(X)= or =vspan(X)=
- in Calc: =
v :' (calc-set-span') [`vspan']= - summarizes values to be aggregated into an interval =[MIN..MAX]= where =MIN= and =MAX= are the minimal and maximal values to be aggregated
- in Calc: =
=rnorm(X)=
- in Calc: =
v n' (calc-rnorm) [`rnorm']= - like =vmax(X)=, gives the maximum of values, but first replacing negative values by their opposite
- in Calc: =
=mean(X)= or =vmean(X)=
- in Calc: =
u M' (calc-vector-mean') [`vmean']= - computes the average (arithmetic mean) of elements being aggregated
- in Calc: =
=meane(X)= or =vmeane(X)=
- in Calc: =
I u M' (calc-vector-mean-error') [`vmeane']= - computes the average (as mean) along with the estimated error of elements being aggregated
- in Calc: =
=median(X)= or =vmedian(X)=
- in Calc: =
H u M' (calc-vector-median') [`vmedian']= - computes the median of elements being aggregated, by taking the middle element after sorting them
- in Calc: =
=hmean(X)= or =vhmean(X)=
- in Calc: =
H I u M' (calc-vector-harmonic-mean') [`vhmean']= - computes the harmonic mean of elements being aggregated
- in Calc: =
=gmean(X)= or =vgmean(X)=
- in Calc: =
u G' (calc-vector-geometric-mean') [`vgmean']= - computes the geometric mean of elements being aggregated
- in Calc: =
=sdev(X)= or =vsdev(X)=
- in Calc: =
u S' (calc-vector-sdev') [`vsdev']= - computes the standard deviation of elements being aggregated
- in Calc: =
=psdev(X)= or =vpsdev(X)=
- in Calc: =
I u S' (calc-vector-pop-sdev') [`vpsdev']= - computes the population standard deviation (divide by N instead of N-1)
- in Calc: =
=var(X)= or =vvar(X)=
- in Calc: =
H u S' (calc-vector-variance') [`vvar']= - computes the variance of elements being aggregated
- in Calc: =
=pvar(X)= or =vpvar(X)=
- in Calc: =
H u S' (calc-vector-variance') [`vpvar']= - computes the population variance of elements being aggregated
- in Calc: =
=pcov(X,Y)= or =vpcov(X,Y)=
- in Calc: =
I u C' (calc-vector-pop-covariance') [`vpcov']= - computes the population covariance of elements being aggregated from two columns (divides by N)
- in Calc: =
=cov(X,Y)= or =vcov(X,Y)=
- in Calc: =
u C' (calc-vector-covariance') [`vcov']= - computes the sample covariance of elements being aggregated from two columns (divides by N-1)
- in Calc: =
=corr(X,Y)= or =vcorr(X,Y)=
- in Calc: =
H u C' (calc-vector-correlation') [`vcorr']= - computes the linear correlation coefficient of elements being aggregated in two columns
- in Calc: =
=prod(X)= or =vprod(X)=
- in Calc: =
u *' (calc-vector-product') [`vprod']= - computes the product of elements being aggregated
- in Calc: =
=vlist(X)= or =list(X)=
- gives the list of =X= being aggregated, verbatim, without aggregation.
=(X)= or =X= in a formula
- returns the list of =X= being aggregated, without aggregation, passed through Calc interpretation.
=sort(X)=
- in Calc: =
v S' (calc-sort') [`sort']= - sorts elements to be aggregated in ascending order; only works on numerical values
- in Calc: =
=rsort(X)=
- in Calc: =
I v S' (calc-sort') [`sort']= - sorts elements to be aggregated in descending order; only works on numerical values
- in Calc: =
=rev(X)=
- in Calc: =
' (calc-reverse-vector') [`rev']= - returns the list of values to be aggregated in reverse order
- in Calc: =
=subvec(X,from)=, =subvec(X,from,to)=
- in Calc: =
v s' (calcFunc-subvec') [`subvec']= - extracts a sub-list from =X= starting at =from= and ending at =to= excluded (or up to the end if =to= is not given). The first value is numbered =1=. So for instance =subvec(X,1,3)= extracts the first two values
- in Calc: =
=vmask(M,X)=
- in Calc: =
v m' (calcFunc-vmask') [`vmask']= - extracts a sub-list from =X=, keeping only values for which corresponding values in =M= (the mask) are not zero
- in Calc: =
=head(X)=
- in Calc: =
v h' (calc-head') [`head']= - returns the first value to be aggregated
- in Calc: =
=rtail(X)=
- in Calc: =
H I v h' (calc-head') [`rtail']= - returns the last value to be aggregated
- in Calc: =
=find(X,val)=
- in Calc: =
v f' (calc-vector-find') [`find']= - returns the index of =val= in the list of values to be aggregated, or =0= if =val= is not found. Index starts from =1=
- in Calc: =
=rdup(X)=
- in Calc: =
v +' (calc-remove-duplicates') [`rdup']= - remove duplicates from =X= and returns remaining values sorted in ascending order
- in Calc: =
=grade(X)=
- in Calc: =
v G' (calc-grade') [`grade']= - returns a list of index of values to be aggregated: the index of the lowest value, then the second lowest value, and so on up to the index of the highest value. Indexes start from =1=
- in Calc: =
=rgrade(X)=
- in Calc: =
I v G' (calc-grade') [`rgrade']= - Like =grade= in reverse order
- in Calc: =
The aggregation functions may be written with or without a leading =v=. =sum= and =vsum= are equivalent. The =v= form should be preferred, as it is the one used in the Org table spreadsheet, and in Calc. The non-v names may be dropped in the future.
** Correlation of two columns :PROPERTIES: :CUSTOM_ID: correlation-of-two-columns :END:
Some aggregations work on two columns (rather than one column for =vsum()=, =vmean()=). Those aggregations are =vcov(,)=, =vpcov(,)=, =vcorr(,)=.
- =vcorr(,)= computes the linear correlation between two columns.
- =vcov(,)= and =vpcov(,)= compute the covariance of two columns.
Example. We create a table where column =y= is a noisy version of column =x=:
#+begin_example #+tblname: noisydata | bin | x | y | |-------+----+---------| | small | 1 | 10.454 | | small | 2 | 21.856 | | small | 3 | 30.678 | | small | 4 | 41.392 | | small | 5 | 51.554 | | large | 6 | 61.824 | | large | 7 | 71.538 | | large | 8 | 80.476 | | large | 9 | 90.066 | | large | 10 | 101.070 | | large | 11 | 111.748 | | large | 12 | 121.084 | #+tblfm: $3=$2*10+random(1000)/500;%.3f #+end_example
#+begin_example #+BEGIN: aggregate :table noisydata :cols "bin vcorr(x,y) vcov(x,y) vpcov(x,y)" | bin | vcorr(x,y) | vcov(x,y) | vpcov(x,y) | |-------+----------------+---------------+---------------| | small | 0.999459736649 | 25.434 | 20.3472 | | large | 0.999542438688 | 46.4656666667 | 39.8277142857 | #+END #+end_example
We see that the correlation between =x= and =y= is very close to =1=, meaning that both columns are correlated. Indeed they are, as the =y= is computed from =x= with the formula #+begin_example y = 10*x + noise_between_0_and_2 #+end_example
** (Almost) any expression can be specified :PROPERTIES: :CUSTOM_ID: almost-any-expression-can-be-specified :END:
Virtually any Calc formula can be specified as an aggregation formula.
Single column name (as they appear in the header of the source table, or in the form of =$1=, =$2=, ..., or the virtual columns =hline= and =@#=) are key columns. Everything else is given to Calc, to be computed as an aggregation.
For instance: #+begin_example (3) ;; a constant vmean(2*X+1) ;; aggregate an expression exp(vmean(map(log,N))) ;; the exponential average vsum((X-vmean(X))^2) ;; X-vmean(X) centers the sample on zero #+end_example
Arguably, the first expression is useless, but legal. The aggregation can be applied to a computed list of values. The result of an aggregation can be further processed in a formula. An aggregation can even be applied to an expression containing another aggregation.
In an expression, if a variable has the name of a column, then it is replaced by a Calc vector containing values from this column.
The special expression =(C)= (a column name within parenthesis) yields a list of values to be aggregated from this column, except they are not aggregated. Note that parenthesis are required, otherwise, =C= would act as a key grouping column.
- Column names :PROPERTIES: :CUSTOM_ID: column-names :END:
** Input table with or without a header :PROPERTIES: :CUSTOM_ID: input-table-with-or-without-a-header :END: The header of a table gives names to its columns. It is separated from data with an horizontal line.
#+begin_example column name is "quantity" or "$2"╶╮ column name is "day" or "$1"╶╮ │ ╭─────────────────────────╯ │ │ ╭───────────────╯ ▼ ▼ | day | quantity | |-----------+----------| | monday | 12.3 | | monday | 5.9 | | thursday | 41.1 | | wednesday | 16.8 | #+end_example
In this example, the input columns may be referred to as =day= and =quantity=.
Tables without a header are handled by OrgAggregate with /"dollar names"/. Example of a table without a header:
#+begin_example column name is "$2"╶──╮ column name is "$1"╶╮ │ ╭───────────────╯ │ │ ╭─╯ ▼ ▼ | monday | 12.3 | | monday | 5.9 | | thursday | 41.1 | | wednesday | 16.8 | #+end_example
Then columns may be refereed to as =$1= and =$2=.
** Column names of the input table :PROPERTIES: :CUSTOM_ID: column-names-of-the-input-table :END:
Column names are not necessarily alphanumeric words. They may contain any characters, including spaces, quotes, +, -, whatever. They must not extend on several lines thought.
Those names need to be protected with quotes (single or double quotes) within formulas.
Examples:
- =:cols= "=mean('estimated value')="
- =:cond (equal "true color" "Red")=
Quoting is not required for
- ASCII letters
- numbers
- underscore _, dollar $, dot .
- accented letters like à é
- Greek letters like α, Ω
- northern letters like ø
- Russian letters like й
- Esperanto letters like ŭ
- Japanese ideograms like 量
Note that in =:cond= Lisp expression, only double quotes work. This is because single quotes in Lisp have a very special meaning.
=Ubuntu Mono= font can be used for displaying aligned Japanese characters, although not perfectly.
** Multiple lines header :PROPERTIES: :CUSTOM_ID: multiple-lines-header :END: The header of the source table may be more than one row tall. Only the first header row is used to match column names between the source table and the =:cols= specifications.
Best effort is made to propagate additional header rows to the aggregated table. This happens when the aggregated column refers to a single source column, either as a key column or a formula involving a single column.
#+begin_example #+name: tall-header
╭──────────────────────────────────────────────╮
╭───┴──╮ │
| color | quantity | level | ╶╮ │ | | | <3> | ├─╴header is 3 rows tall │ | kolor | kiom | nivelo | ╶╯ │ |--------+----------+--------| ╭───────────╯───────────────╮ | yellow | 72 | 3 | │ only the first header row │ | green | 55 | 5 | │ is used in formulas │ | | | | ╰───────────╭───────────────╯ | orange | 80 | 2 | │ | yellow | 13 | 1 | │ ╭───┴──╮ #+BEGIN: aggregate :table "tall-header" :cols "color vsum(quantity);'sum' count();'nb' vsum(quantity)/vmean(level);'leveled'" | color | sum | nb | leveled | ╶╮ | | | | | ├──╮ | kolor | kiom | | | ╶╯ │ ╭────────────────────────╮ |--------+------+----+---------| │ │ best attempt to recover│ | yellow | 85 | 2 | 42.5 | ╰──┤ the three header rows │ | green | 55 | 1 | 11 | │ in the output │ | orange | 80 | 1 | 40 | ╰────────────────────────╯ #+END: #+end_example
Note that the last aggregated column has just =leveled= in its header. This is because this column refers to more than one source columns, namely =quantity= and =level=.
Note that in this example, there are formatting cookies: : <> <7>
Data rows containing at least one cookie are ignored. They are not ignored in the header.
** Custom column names :PROPERTIES: :CUSTOM_ID: custom-column-names :END: In this example, output column have names which are difficult to handle:
#+begin_example #+BEGIN: aggregate :table original :cols "Day vmean(Level2) vsum(Quantity^2)" ╰─────┬──────╯ ╰──────┬───────╯ ╭───────────────────────────────╯ │ │ ╭─────────────────────────────╯ ╭─────┴──────╮ ╭──────┴───────╮ | Day | vmean(Level2) | vsum(Quantity^2) | |-----------+----------------+------------------| | Monday | 55 | 130 | | Tuesday | 86 | 693 | | Wednesday | 36 | 1010 | | Thursday | 86 | 2317 | | Friday | 16 | 170 | #+END #+end_example
We can give them custom names with the =;'custom name'= decoration:
#+begin_example #+BEGIN: aggregate :table original :cols "Day vmean(Level*2);'mean2' vsum(Quantity^2);'sum_squares'" ╰──┬──╯ ╰─────┬─────╯ ╭───────────────────────────────────────────────╯ │ │ ╭─────────────────────────────────────────────────────────────────╯ ╭─┴─╮ ╭───┴─────╮ | Day | mean2 | sum_squares | |-----------+-------+-------------| | Monday | 55 | 130 | | Tuesday | 86 | 693 | | Wednesday | 36 | 1010 | | Thursday | 86 | 2317 | | Friday | 16 | 170 | #+END #+end_example
Decorators are optional.
- Formatters :PROPERTIES: :CUSTOM_ID: formatters :END:
** Org Mode compatible formatters :PROPERTIES: :CUSTOM_ID: org-mode-compatible-formatters :END: An expression may optionally be followed by modifiers and formatters, after a semicolon. Examples:
#+begin_example vsum(X);p20 ;; increase Calc internal precision to 20 digits vsum(X);f3 ;; output the result with 3 digits after the decimal dot vsum(X);%.3f ;; output the result with 3 digits after the decimal dot #+end_example
The modifiers and formatters are fully compatible with those of the Org Mode spreadsheet.
- =p12= change the precision to 12 decimal digits.
- =n7= output as floating point number with 7 decimal digits.
- =f4= output number with 4 decimal places after dot.
- =s5= output number in "scientific" mode (with exponent of 10) with 5 decimal digits.
- =e6= output number in "engineering" mode (with exponent of 10 multiple of 3) with 6 decimal digits.
- =t= output duration in decimal hours; input is supposed to be either a duration like ="2:37"= meaning 2 hours and 37 minutes, or a number of seconds like ="1234=" which is approximately =0.34= hours. The output is controlled by the =org-table-duration-custom-format= variable.
- =T= output duration in an hours-minutes-seconds format like ="01:20:34"= meaning 1 hour, 20 minutes, and 34 seconds.
- =U= like =t=, but disregard the =org-table-duration-custom-format= variable and use =hh:mm= in place.
- =N= output number: remove any non-numeric output.
- =E= keep empty input cells. The result is often =nan=. Without =E=, empty input cells are ignored as if they did not exist.
- =D= angles are in degrees.
- =R= angles are in radians.
- =F= output is presented as a fraction of integers if it actually is. The format is the Calc one, for example ="2:3"= means =2/3=.
- =S= symbolic mode. When an input cell is, for instance =sqrt(2)=, it it kept as-is rather than being replaced by =1.41421=.
** Debugging formatters :PROPERTIES: :CUSTOM_ID: debugging-formatters :END: Additionally, a few formatters are dedicated to debugging:
- =c= output the Calc expression before substitution by actual input cells values.
- =q= output the Lisp expression before substitution by actual input cells values.
- =C= output the Calc expression before it gets simplified and folded.
- =Q= output the Lisp expression before it gets simplified and folded.
See [[#debugging][Debugging]] for a detailed explanation.
** Discarding an output column :PROPERTIES: :CUSTOM_ID: discarding-an-output-column :END: Why would anyone specify a column just to discard it in the output? For its side effects. For sorting the output table or for adding hlines to it.
To discard a column, add a =;<>= modifier to the column description. This syntax is reminiscent of the == cookies in Org Mode tables, which instructs to shorten a column width to only =n= characters.
In this example, input hlines create a =hline= column which is used to add hlines to the output. Then this =hline= column is discarded with =<>=.
#+begin_example invisible╶────────────────────────────────────────────╮ sorted numerically increasing╶──────────────────────╮ │ │ │ ▼ ▼ #+BEGIN: aggregate :table "withhline" :cols "hline;^n;<> cölØr vsum(vâluε)" :hline 1 | cölØr | vsum(vâluε) | |--------+-------------| | Red | 7.4 | | Yellow | 9.1 | |--------+-------------| | Blue | 15.7 | | Yellow | 5.4 | |--------+-------------| | Blue | 4.9 | | Red | 3.9 | | Yellow | 9. | |--------+-------------| | Red | 1.1 | | Yellow | 3.4 | #+END: #+end_example
Here is an example where rows are sorted on the =cölØr= column, but without displaying this column:
#+begin_example invisible╶────────────────────────────────────────────╮ sorted alphabetically╶──────────────────────────────╮ │ │ │ ▼ ▼ #+BEGIN: aggregate :table "withhline" :cols "cölØr;^a;<> vâluε;^n" :hline 1 | vâluε | ▲ |-------| │ | 4.9 | within the same cölØr bucket, │ | 7.0 | sort vâluε numerically╶─────────────────────╯
| 8.7 |
|---|
| 1.1 |
| 1.3 |
| 2.6 |
| 3.5 |
| 3.9 |
| ------- |
| 2.4 |
| 3.4 |
| 5.4 |
| 6.6 |
| 9.1 |
| #+END: |
| #+end_example |
- Sorting :PROPERTIES: :CUSTOM_ID: sorting :END:
** Example with one sorting column :PROPERTIES: :CUSTOM_ID: example-with-one-sorting-column :END: In this example, the output table is sorted numerically on its second column (look at the =^n= specification):
#+begin_example #+begin: aggregate :table "original" :cols "Day vsum(Quantity);^n" | Day | vsum(Quantity) | |-----------+----------------| | Monday | 14 | | Friday | 22 | | Tuesday | 45 | | Wednesday | 54 | | Thursday | 83 | #+end: #+end_example
By default, no sorting is done. The output rows follows the ordering of the input rows.
Any column specification in the =:cols= parameter may be followed by a semicolon and caret characters, and an ordering.
The specification for the ordering are the same as in Org Mode:
- =a=: ascending alphabetical sort
- =A=: descending alphabetical sort
- =n=: ascending numerical sort
- =N=: descending numerical sort
- =t=: ascending date, time, or duration sort
- =T=: descending date, time, or duration sort
- =f= & =F= specifications are not (yet) implemented
** Several sorting columns :PROPERTIES: :CUSTOM_ID: several-sorting-columns :END:
The rows of the resulting table may be sorted on any combination of its columns.
Several columns may get a sorting specification. The major column is used for sorting. Only when two rows are equal regarding the major column, the second major column is compared. And if the two rows are still equal on this second column, the third is used, and so on.
The first sorted column in the =:cols= parameter is the major one. To declare another one as the major, follow it with a number, for instance =1=. Columns without a number are minor ones.
Example: #+begin_example :cols "AAA;^a BBB;^N2 CCC DDD;^t1" ╭────╮ ▲╷ ▲▲ ▲▲ ╭──────────────╮ │sort╰──────╯╰─╮ ││ │╰─╯first priority│ │alphabetically│ ││ ╰──╮sort by date │ │third priority│ ││ ╰──────────────╯ ╰──────────────╯ ││ ╭────────────────╮ │╰──╯second priority │ ╰───╮sort numerically│ │decreasing │ ╰────────────────╯ #+end_example
- Column =DDD= is sorted in ascending dates or times (=t= specification). It is the major sorting column (because of its =1= numbering).
- Column =BBB= sorts rows which compare equal on column =DDD= (because of its =2= numbering). This column is assumed to contain numerical values, and it is sorted in descending order (=N= specification).
- Column =AAA= is used to sort rows which compare equal regarding =DDD= and =BBB=. It is sorted in ascending alphabetical order (=a= specification).
Both a format and a sorting instruction may be given. Example: #+begin_example :cols "EXPR:f3:^n" #+end_example
The =EXPR= column is
- formatted with 3 digits after dot (=f3=)
- sorted numerically in ascending order (=^n=).
- hlines in the output table :PROPERTIES: :CUSTOM_ID: hlines-in-the-output-table :END:
The =:hline N= parameter controls horizontal lines in the output table. It may or may not be related to horizontal lines in the input.
** Output hlines depends on sorting columns :PROPERTIES: :CUSTOM_ID: output-hlines-depends-on-sorting-columns :END: Example of an input table:
#+begin_example #+name: withouthline | cölØr | vâluε | ra;han | |--------+-------+--------| | Red | 1.3 | 41 | | Red | 3.5 | 35 | | Yellow | 9.1 | 95 | | Red | 2.6 | 84 | | Blue | 8.7 | 52 | | Blue | 7.0 | 29 | | Yellow | 5.4 | 17 | | Blue | 4.9 | 64 | | Red | 3.9 | 51 | | Yellow | 2.4 | 55 | | Yellow | 6.6 | 34 | | Red | 1.1 | 58 | | Yellow | 3.4 | 51 | #+end_example
Horizontal lines appear on the sorted column, which in this example is the =cölØr= column.
We require output hlines with =:hline 1=. The =1= value here says that only one sorted column should be considered when drawing output horizontal lines. A value of =2= would mean to consider two sorted columns.
Horizontal lines will separate blocks of identical =cölØr= rows:
#+begin_example #+BEGIN: aggregate :table "withouthline" :cols "cölØr;^a vâluε 'ra;han'" :hline 1 | cölØr | vâluε | 'ra;han' | ╰─┬─╯ ▲ ╰─┬─╯ |--------+-------+----------| │ │ │ | Blue | 8.7 | 52 |╶╮ │ │╭──────╮ ╭─────────┴─────╮ | Blue | 7.0 | 29 | ├─╴Blue bucket │ ╰╯sort │ │ separate │ | Blue | 4.9 | 64 |╶╯ ╰─────╮cölØr │ │ cölØr buckets │ |--------+-------+----------| ◀────────────────╮ ╰──────╯ │ with hlines │ | Red | 1.3 | 41 |╶╮ │ ╰──┬┬───────────╯ | Red | 3.5 | 35 | │ ╰─────────────────────╯│ | Red | 2.6 | 84 | ├─╴Red bucket │ | Red | 3.9 | 51 | │ │ | Red | 1.1 | 58 |╶╯ │ |--------+-------+----------| ◀───────────────────────────────────────╯ | Yellow | 9.1 | 95 |╶╮ | Yellow | 5.4 | 17 | │ | Yellow | 2.4 | 55 | ├─╴Yellow bucket | Yellow | 6.6 | 34 | │ | Yellow | 3.4 | 51 |╶╯ #+END: #+end_example
** Example with hline 2 :PROPERTIES: :CUSTOM_ID: example-with-hline-2 :END:
In the following example, we specify =:hline 2=.
First, the input table now have horizontal lines. We want to propagate them to the output.
#+begin_example #+name: withhline | cölØr | vâluε | ra;han | |--------+-------+--------| | Red | 1.3 | 41 | | Red | 3.5 | 35 | | Yellow | 9.1 | 95 | | Red | 2.6 | 84 | |--------+-------+--------| | Blue | 8.7 | 52 | | Blue | 7.0 | 29 | | Yellow | 5.4 | 17 | |--------+-------+--------| | Blue | 4.9 | 64 | | Red | 3.9 | 51 | | Yellow | 2.4 | 55 | | Yellow | 6.6 | 34 | |--------+-------+--------| | Red | 1.1 | 58 | | Yellow | 3.4 | 51 | #+end_example
The two sorted columns are =hline= and =cölØr=. Therefore output horizontal lines separate blocks of identical =hline= and =cölØr=:
#+begin_example #+begin: aggregate :table "withhline" :cols "hline;^n cölØr;^a vâluε 'ra;han'" :hline 2 | hline | cölØr | vâluε | 'ra;han' | ▲ ▲ ▲ |-------+--------+-------+----------| │ │ │ | 0 | Red | 1.3 | 41 | ╭──────╯ ╰───────────╮ │ | 0 | Red | 3.5 | 35 | │ two sorted output columns │ │ | 0 | Red | 2.6 | 84 | ╰───────────────────────────╯ │ |-------+--------+-------+----------| │ | 0 | Yellow | 9.1 | 95 | │ |-------+--------+-------+----------| ╭─────────────────────────────╮ │ | 1 | Blue | 8.7 | 52 | │ 2 means: create hlines ├─────╯ | 1 | Blue | 7.0 | 29 | │ for buckets and sub-buckets │ |-------+--------+-------+----------| ╰─────────────────────────────╯ | 1 | Yellow | 5.4 | 17 | |-------+--------+-------+----------|╶─╮ | 2 | Blue | 4.9 | 64 | │ ╭──────────────────────────╮ |-------+--------+-------+----------| ╶┤ │ bucket hline=2 │ | 2 | Red | 3.9 | 51 | ├───┤ divided in 3 sub-buckets │ |-------+--------+-------+----------| ╶┤ │ Blue, Red, Yellow │ | 2 | Yellow | 2.4 | 55 | │ ╰──────────────────────────╯ | 2 | Yellow | 6.6 | 34 | │ |-------+--------+-------+----------|╶─╯ | 3 | Red | 1.1 | 58 | |-------+--------+-------+----------| | 3 | Yellow | 3.4 | 51 | #+end: #+end_example
And the =hline= column may be discarded (but its side effect remains). To do so use the =;<>= specifier:
#+begin_example #+begin: aggregate :table "withhline" :cols "hline;^n;<> cölØr;^a vâluε 'ra;han'" :hline 2 | cölØr | vâluε | 'ra;han' | |--------+-------+----------| | Red | 1.3 | 41 | | Red | 3.5 | 35 | | Red | 2.6 | 84 | |--------+-------+----------| | Yellow | 9.1 | 95 | |--------+-------+----------| | Blue | 8.7 | 52 | | Blue | 7.0 | 29 | |--------+-------+----------| | Yellow | 5.4 | 17 | |--------+-------+----------| | Blue | 4.9 | 64 | |--------+-------+----------| | Red | 3.9 | 51 | |--------+-------+----------| | Yellow | 2.4 | 55 | | Yellow | 6.6 | 34 | |--------+-------+----------| | Red | 1.1 | 58 | |--------+-------+----------| | Yellow | 3.4 | 51 | #+end: #+end_example
The =:hline= parameter accepts a number:
- =:hline 0=, =:hline no=, =:hline nil=, or no =:hline= mean that there will be no hlines in the output.
- =:hline 1=, =:hline yes=, =:hline t= mean that hlines will separate blocks of identical rows regarding the major sorted column. In case no column is sorted, then output hlines will reflect input ones.
- =:hline 2= means that the major and the next major sorted columns will be used to separate identical rows regarding those two columns.
- =:hline 3=, =:hline 4=, ... may be specified, but they may result in too much hlines.
- Cells processing :PROPERTIES: :CUSTOM_ID: cells-processing :END:
Calc is the standard Emacs desktop calculator. Actual mathematical computations are handled through Calc. This offers a lot of flexibility.
** Where Calc interpretation happens? :PROPERTIES: :CUSTOM_ID: where-calc-interpretation-happens :END:
Example of input table. Besides numbers, there are cells with mathematical expressions like =20*30=, or just labels as =Red&Green= without any mathematical meaning.
#+begin_example #+name: to_Calc_or_not_to_Calc | Day | Color | Level | |-----------+------------+--------| | Monday | Red | 20*30 | | Monday | Blue | 55+45 | | Tuesday | Red | 1 | | Tuesday | Red&Green | 2 | | Tuesday | Blue+Green | 3 | | Wednesday | Red | (27) | | Wednesday | Red | (12+1) | | Wednesday | Green | [15] | #+end_example
Basically, Calc operates twice. For example in the formula =vsum(Level)=:
- Calc computes =Level= for every input cell in the =Level= column,
- then Calc computes =vsum()= applied to the resulting list.
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day vsum(Level)" | Day | vsum(Level) | |-----------+-------------| | Monday | 700 | | Tuesday | 6 | | Wednesday | 55 | #+END: #+end_example
There are a few occasions were Calc computation does not happen: =vcount()= and =vlist(X)=.
The =vcount()= sub-formula is evaluated as the number of input rows in each group, without Calc intervention. However, later on Calc can handle this number in a formula as this one: =vsum(Level)/vcount()=
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day vcount() vsum(Level)/vcount()" | Day | vcount() | vsum(Level)/vcount() | |-----------+----------+----------------------| | Monday | 2 | 350 | | Tuesday | 3 | 2 | | Wednesday | 3 | 18.333333 | #+END: #+end_example
And of course when input cells do not have a mathematical meaning, the result may be non-sens:
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day vsum(Color)" | Day | vsum(Color) | |-----------+------------------------------------------------| | Monday | Red + Blue | | Tuesday | Red + error(3, '"Syntax error") + Blue + Green | | Wednesday | 2 Red + Green | #+END: #+end_example
But it can also make sens. The last row, which aggregate =Wednesday= rows, is computed as =2 Red + Green=. This is correct. This symbolic result (as opposed to numerical result) shows the power of Calc as a symbolic calculator.
The =vlist(X)= formula is not handled by Calc at all. This formula must appear alone (not embedded as part of a bigger formula). The cells =X= are not interpreted by Calc. As a result, =vlist(X)= produces a cell which concatenates input cells verbatim. For instance, the input cell =20*30= is left as-is.
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day vlist(Color) vlist(Level)" | Day | vlist(Color) | vlist(Level) | |-----------+----------------------------+--------------------| | Monday | Red, Blue | 20*30, 55+45 | | Tuesday | Red, Red&Green, Blue+Green | 1, 2, 3 | | Wednesday | Red, Red, Green | (27), (12+1), [15] | #+END: #+end_example
As a contrast, the formula =(Level)= yields a list processed through Calc. For instance, the =20*30= formula is replaced by =600=.
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day (Color) (Level)" | Day | (Color) | (Level) | |-----------+------------------------------------------------+----------------| | Monday | [Red, Blue] | [600, 100] | | Tuesday | [Red, error(3, '"Syntax error"), Blue + Green] | [1, 2, 3] | | Wednesday | [Red, Red, Green] | [27, 13, [15]] | #+END: #+end_example
Here we used parenthesis in =(Color)= and =(Level)= because otherwise they would have been /key columns/. Instead of parenthesis, we can embed such expressions in formulas, like =Level+1=:
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day Level+1" | Day | Level+1 | |-----------+----------------| | Monday | [601, 101] | | Tuesday | [2, 3, 4] | | Wednesday | [28, 14, [16]] | #+END: #+end_example
To summarize, a column name embedded in a formula is evaluated as the list of input cells, processed by Calc. Except for the =vlist(Column)= formula where input cells are kept verbatim.
By the way, what is the meaning of the expression =LevelLevel=? For =Monday=, it is =[600,100][600,100]=. Then Calc simplifies that as a /vector product/: sum of individual products. =600^2+100^2=
#+begin_example #+BEGIN: aggregate :table "to_Calc_or_not_to_Calc" :cols "Day LevelLevel Level+Level" | Day | LevelLevel | Level+Level | |-----------+-------------+----------------| | Monday | 370000 | [1200, 200] | | Tuesday | 14 | [2, 4, 6] | | Wednesday | 1123 | [54, 26, [30]] | #+END: #+end_example
** Dates :PROPERTIES: :CUSTOM_ID: dates :END:
Some aggregations are possible on dates. Example. Here is a source table containing dates:
#+begin_example #+tblname: datetable
| Date |
|---|
| [2035-12-22 Sat 09:01] |
| [2034-11-24 Fri 13:04] |
| [2030-09-24 Tue 13:54] |
| [2027-09-25 Sat 03:54] |
| [2023-02-26 Sun 16:11] |
| [2020-03-17 Tue 03:51] |
| [2018-08-21 Tue 00:00] |
| [2012-12-25 Tue 00:00] |
| #+end_example |
Here are the earliest and the latest dates, along with the average of all input dates:
#+begin_example #+BEGIN: aggregate :table datetable :cols "vmin(Date) vmax(Date) vmean(Date)" | vmin(Date) | vmax(Date) | vmean(Date) | |------------------------+------------------------+-------------| | <2012-12-25 Tue 00:00> | <2035-12-22 Sat 09:01> | 739448.44 | #+END: #+end_example
The average of all dates is a number? Actually, it is a date expressed as the number of days since =[0000-12-31 Sun 00:00]=. To force a number of days to be interpreted as a date, use the =date()= function:
#+begin_example #+BEGIN: aggregate :table datetable :cols "date(vmean(Date))"
| date(vmean(Date)) |
|---|
| <2025-07-16 Wed 10:29> |
| #+END: |
| #+end_example |
With the =date()= function in mind, all kinds of dates handling can be done. Example: the average of earliest and the latest dates is different from the average of all dates:
#+begin_example #+BEGIN: aggregate :table datetable :cols "date(vmean(vmin(Date),vmax(Date))) date(vmean(Date))" | date(vmean(vmin(Date),vmax(Date))) | date(vmean(Date)) | |------------------------------------+------------------------| | <2024-06-23 Sun 16:30> | <2025-07-16 Wed 10:29> | #+END: #+end_example
Note that =date()= is not special to OrgAggregate. It can be used in Org Mode spreadsheet formulas.
** Durations :PROPERTIES: :CUSTOM_ID: durations :END: In Org Mode spreadsheet, durations have the forms =HH:MM= or =HH:MM:SS=. In OrgAggregate, when an input cell have one of those two forms, it is converted into a number of seconds. For instance, =01:00= is converted into =3600= and =00:00:07= is converted into =7=.
There may be a single digit for hours, as in =7:12= or more than two as in =1255:45:00=.
To output such a form, use a formatter: =;T=; =;t=, =;U=. For example, we have 3 durations as input, and we want the average of them:
#+begin_example #+name: some_durations
| dur |
|---|
| 07:45:30 |
| 13:55 |
| 17:12 |
| #+end_example |
#+begin_example #+BEGIN: aggregate :table "some_durations" :cols "vmean(dur) vmean(dur);T vmean(dur);t vmean(dur);U" | vmean(dur) | vmean(dur) | vmean(dur) | vmean(dur) | |------------+------------+------------+------------| | 46650 | 12:57:30 | 12.96 | 12:57 | #+END: #+end_example
- With no formatter, we get a number of seconds
- The =T= formatter outputs the result as =HH:MM:SS=
- The =U= formatter outputs the result as =HH:MM=
- The =t= formatter converts the result into a number of hours (it divides the number of seconds by 3600, and displays only two digits after dot)
The Calc syntax for durations is also recognized: #+begin_example HH@ MM' HH@ MM' SS" #+end_example Example:
#+begin_example #+name: calc_durations
| dur |
|---|
| 07@ 45' 30 |
| 13@ 55' |
| 17@ 12' |
| #+end_example |
#+begin_example #+BEGIN: aggregate :table "calc_durations" :cols "vmean(dur)"
| vmean(dur) |
|---|
| 12@ 57' 30." |
| #+END: |
| #+end_example |
** Empty and malformed input cells :PROPERTIES: :CUSTOM_ID: empty-and-malformed-input-cells :END:
The input table may contain malformed mathematical text. For instance, a cell containing =5+= is malformed, because an expression is missing after the =+= symbol. In this case, the value will be replaced by =error(2, '"Expected a number")= which will appear in the aggregated table, signaling the problem.
An input cell may be empty. In this case, it may be ignored or converted to zero, depending on modifier flags =E= and =N=.
The empty cells treatment
- makes no difference for =vsum= and =count=.
- may result in zero for =prod=,
- change =vmean= result,
- change =vmin= and =vmax=, a possibly empty list of values resulting in =inf= or =-inf=
Some aggregation functions operate on two columns. If the two columns have empty values at different locations, then they should be interpreted as zero with the =NE= modifier, otherwise the result will be inconsistent.
Sometimes an input table may be malformed, with incomplete rows, like this one:
#+begin_example | Color | Level | Quantity | Day | |-------+-------+----------+-----------| | Red | 30 | 11 | Monday | | Blue | 25 | 3 | Monday | | | Blue | 33 | 18 | Tuesday | | Red | 27 | | Blue | 12 | 16 | Wednesday | | Blue | 15 | 15 | | #+end_example
Missing cells are handled as though they were empty.
** Symbolic computation :PROPERTIES: :CUSTOM_ID: symbolic-computation :END:
The computations are based on Calc, which is a symbolic calculator. Thus, symbolic computations are built-in. Example:
This is the source table:
#+begin_example #+NAME: symtable | Day | Color | Level | Quantity | |-----------+-------+--------+----------| | Monday | Red | 30+x | 11+a | | Monday | Blue | 25+3x | 3 | | Tuesday | Red | 51+2x | 12 | | Tuesday | Red | 45-x | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12+x | 16 | | Wednesday | Blue | 15 | 15-6a | | Thursday | Red | 39 | 24-5a | | Thursday | Red | 41 | 29 | | Thursday | Red | 49+x | 30+9*a | | Friday | Blue | 7 | 5+a | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+end_example
And here is the aggregated, symbolic result:
#+begin_example #+BEGIN: aggregate :table "symtable" :cols "Day vmean(Level) vsum(Quantity)" | Day | vmean(Level) | vsum(Quantity) | |-----------+-----------------------+----------------| | Monday | 2. x + 27.5 | a + 14 | | Tuesday | 0.333333333334 x + 43 | 45 | | Wednesday | x / 3 + 18 | 54 - 6 a | | Thursday | x / 3 + 43. | 4 a + 83 | | Friday | 8 | a + 22 | #+END #+end_example
Symbolic calculations are correctly performed on =x= and =a=, which are symbolic (as opposed to numeric) expressions.
Note that if there are empty cells in the input, they will be changed to =nan= /not a number/, and the whole aggregation will yield =nan=. This is probably not the expected result.
The =N= modifier (see [[#org-mode-compatible-formatters][Org Mode compatible formatters]]) won't help, because even though it will replace empty cells with zero, it will do the same for anything which does not look like a number. The best is to just avoid empty cells when dealing with symbolic calculations.
** Intervals :PROPERTIES: :CUSTOM_ID: intervals :END: Org Mode tables and OrgAggregate backend engine being Emacs Calc, intervals are seamlessly handled. An interval is made of two numerical values separated by two dots.
Example of a spreadsheet. The third column is computed by multiplying the first two:
#+begin_example ,#+name: int | 3..5 | 10 | (30 .. 50) | | 3..5 | -1..2 | (-5 .. 10) | | 5 | 2 | 10 | ,#+TBLFM: $3=$1*$2 #+end_example
Example of an aggregation. The sum of the third column is computed, resulting in an interval:
#+begin_example ,#+BEGIN: aggregate :table "int" :cols "vsum($3)"
| vsum($3) |
|---|
| (35 .. 70) |
| ,#+END: |
| #+end_example |
** Error or precision forms :PROPERTIES: :CUSTOM_ID: error-or-precision-forms :END:
In Emacs Calc, an error form is a numerical value along with its precision, both values separated by =+/-=.
The separation may also be the Unicode character =±=. In this spreadsheet example, the second column is 10 times the first:
#+begin_example ,#+name: cert | 12±2 | 120 +/- 20 | | 55±3 | 550 +/- 30 | | 9±0 | 90 | | 15±1 | 150 +/- 10 | | 21±1 | 210 +/- 10 | ,#+TBLFM: $2=10*$1 #+end_example
Now, in the following example, OrgAggregate computes the sum of the second column:
#+begin_example ,#+BEGIN: aggregate :table "cert" :cols "vsum($2);f2"
| vsum($2) |
|---|
| 1120 +/- 38.73 |
| ,#+END: |
| #+end_example |
The computation made by Emacs Calc assumes that all input values follow a Gaussian distribution, and are independent variables. Then it applies the textbook formula for combining Gaussian distributions. Beware that your input values may not be independent with each other. In this case, the resulting error will be slightly off (too small).
- Wide variety of inputs :PROPERTIES: :CUSTOM_ID: wide-variety-of-inputs :END:
As in any other Org Mode source block, the input table may come from several places. OrgAggregate adds even more kinds of input.
Here we discus the =:table= parameter.
** Standard Org Mode input :PROPERTIES: :CUSTOM_ID: standard-org-mode-input :END:
The parameter after =:table= may be:
=mytable=: an ordinary Org Mode table in the same buffer, named =mytable=.
=/some/dir/file.org:mytable=: an ordinary Org Mode table named =mytable=, in a distant Org file named =/some/dir/file.org=.
** Virtual input table from Babel :PROPERTIES: :CUSTOM_ID: virtual-input-table-from-babel :END:
=mybabel=: an Org Mode Babel block named =mybabel= in the current buffer, generating a table as its output, written in any language.
=mybabel(param1=123,param2=456)=: passing parameters to an Org Mode Babel block named =mybabel= in the current buffer, generating a table as its output, written in any language.
=/some/dir/file.org:mybabel(param1=123,param2=456)=: an Org Mode Babel block named =mybabel= in a distant org file named =/some/dir/file.org=, called with parameters.
The input table may be the result of executing a Babel script. In this case, the table is virtual in the sense that is appears nowhere.
(Babel is the Org Mode infrastructure to run scripts in any language, like Python, R, C++, Java, D, Rust, shell, whatever, with inputs and outputs connected to Org Mode).
Example:
Here is a script in Emacs Lisp which creates an Org Mode table.
#+begin_example #+name: ascript #+begin_src elisp :colnames yes `( ("label" value) ; cells are symbols or strings hline ,@(cl-loop for i from 10 to 20 collect (list (format "lbl-%c" (+ ?A (% i 3))) ; cell is a string i))) ; cell is a number #+end_src #+end_example
If executed, the script would output this table:
#+begin_example #+RESULTS: ascript | label | value | |-------+-------| | lbl-B | 10 | | lbl-C | 11 | | lbl-A | 12 | | lbl-B | 13 | | lbl-C | 14 | | lbl-A | 15 | | lbl-B | 16 | | lbl-C | 17 | | lbl-A | 18 | | lbl-B | 19 | | lbl-C | 20 | #+end_example
But instead, OrgAggregate will execute the script and consume its output:
#+begin_example #+BEGIN: aggregate :table "ascript" :cols "label vsum(value)" | label | vsum(value) | |-------+-------------| | lbl-B | 58 | | lbl-C | 62 | | lbl-A | 45 | #+END: #+end_example
Here the parameter =:table= specifies the name of the script to be executed.
Beware of the =:results code= parameter. It does not work as an input for OrgAggregate. This is because in this case the Babel script returns a string, not a table. Example:
#+begin_src elisp :results code '((a b c) hline (1 2 3)) #+end_src
Use =:results table= or nothing instead.
** An Org ID :PROPERTIES: :CUSTOM_ID: an-org-id :END:
- =34cbc63a-c664-471e-a620-d654b26ffa31=: an identifier of an Org Mode sub-tree. The sub-tree is supposed to contain an Org table (which is retrieved) or a Babel script (which is executed).
Those Org Mode identifiers span all known Org Mode files. (Therefore there is no need to specify a file). To add such an identifier, put the cursor on the heading of the sub-tree, and type =M-x org-id-get-create=.
** CSV input :PROPERTIES: :CUSTOM_ID: csv-input :END:
CVS input is specific to OrgAggregate. (Native Org Mode does not offers those formats).
=/some/dir/file.csv:(csv params…)=: a comma-separated-values file in the CSV format, in the file =/some/dir/file.csv=.
=name(csv params…)=: CSV formatted data within an Org block named ="name"=, in the current file.
=/some/dir/file.org:name(csv params…)=: CSV formatted data within an Org block named ="name"=, in a distant Org file.
The cells separators in the CSV files may be TAB, comma, or semicolon, they are guessed and different separators may be mixed.
Any empty row in the CSV file is interpreted as an horizontal separator (=hline= in Org table parlance).
Parameters may be:
- =header=: the first row in the CSV file is interpreted as a header containing the column names.
- =colnames (column1 column2 column3 …)=: the column names are given explicitly, in case the CSV file contains only data, no header. In any case, the columns may be references as =$1, $2, $3, …= as usual.
When data is in an Org Mode file, it is supposed to be within a named block of any kind. Example with a "quote" block:
#+begin_example #+name: mycsvdata #+begin_quote label,quantity yellow,27 red,-61 yellow,41 red,-29 red,-17 #+end_quote #+end_example
** JSON input :PROPERTIES: :CUSTOM_ID: json-input :END:
=/some/dir/file.json:(json params…)=: a file containing a JSON formatted table, in the file =/some/dir/file.csv=.
=name(json params…)=: JSON formatted data within an Org block named ="name"=, in the current file.
=/some/dir/file.org:name(json params…)=: JSON formatted data within an Org block named ="name"=, in a distant Org file.
The accepted formats are a vector of vectors, or a vector of hash-objects.
Currently, no =params= are recognized.
Example of a vector of vectors:
#+begin_example [ ["mon",12,"red" ], ["thu",34,"blue" ], ["wed",27,"green"], ["wed",21,"red" ], ["mon", 7,"blue" ], … ] #+end_example
Example of a vector of hash-objects:
#+begin_example [ {"day":"mon", "quty":12, "color":"red" }, {"day":"thu", "quty":34, "color":"blue" }, {"quty":27, "day":"wed", "color":"green"}, {"quty":21, "color":"red", "day":"wed" }, {"day":"mon", "quty": 7, "color":"blue" }, … ] #+end_example
In the case of hash-objects, the keys are collected as the header of the resulting table, in the order seen in the JSON file. In each hash-object, the order of key-value pairs is irrelevant.
A header may be specified in the JSON file as a first vector, followed by an hline (horizontal line). Example:
#+begin_example [ ["day","quty","color"], "hline", ["mon",12,"red" ], ["thu",34,"blue" ], … ] #+end_example
Horizontal lines may be ="hline"=, =[]=, ={}=, or =null=.
It is possible to mix both formats: vectors and hash-objects. Example:
#+begin_example [ ["quty","color"], // incomplete header null, // horizontal line {"day":"mon", "quty":12, "color":"red" }, // an hash-object ["thu", 34, "blue" ], // a vector … ] #+end_example
When data is in an Org Mode file, it is supposed to be within a named block of any kind. Example with a "src" block for JavaScript:
#+begin_example #+name: myjsondata #+begin_src js [ ["day","quty","color"], "hline", ["mon",12,"red" ], ["thu",34,"blue" ], … ] #+end_src #+end_example
** Input slicing :PROPERTIES: :CUSTOM_ID: input-slicing :END:
Org Mode also provides for table slicing. All of the previous references may be followed by an optional slicing. Examples:
=mytable[0:5]=: retain only the first 6 rows of the input table; if the table has a header, then it counts as 2 rows (the header and the separation line). In this example, it would retain rows 0 and 1 for the header, and rows 2,3,4,5 for the content.
=mytable[*,0:1]=: retain only the first 2 columns.
=mytable[0:5,0:1]=: retain only the first 6 rows and the first 2 columns.
** The :cond filter :PROPERTIES: :CUSTOM_ID: the-cond-filter :END:
This parameter is optional. If present, it specifies a Lisp expression which tells whether or not a row should be kept. When the expression evaluates to nil, the row is discarded.
Examples of useful expressions includes:
- =:cond (equal Color "Red")=
- to keep only rows where =Color= is =Red=
- =:cond (> (string-to-number Quantity) 19)=
- to keep only rows for which =Quantity= is more than =19=
- note the call to =string-to-number=; without this call, =Quantity= would be used as a string
- =:cond (> (* (string-to-number Level) 2.5) (string-to-number Quantity))=
- to keep only rows for which =2.5*Level > Quantity=
Beware with this example: =:cond (equal Color "Red")=. The input table should not have a column named =Red=, otherwise the condition will mean: /keep only rows with the same value in columns Color and Red/
As a special case, when =:cols= parameter is not given, the result is the same as =:cols "COL1 COL2 COL3..."=. All columns in the input table are specified as key columns, and output in the resulting table.
This is useful when just filtering. But be aware that aggregation still occurs. So duplicate input rows appear only once in the result.
** Virtual input columns :PROPERTIES: :CUSTOM_ID: virtual-input-columns :END: What if we want to aggregate on months? But the input table contains only plain dates. Example:
#+begin_example #+name: without-months | Date | Quantity | |------------------+----------| | [2037-03-12 thu] | 56.93 | | [2037-03-25 wed] | 20.99 | | [2037-04-07 tue] | 80.81 | | [2037-04-20 mon] | 22.26 | | [2037-05-03 sun] | 69.75 | | [2037-05-16 sat] | 39.91 | | [2037-05-29 fri] | 93.13 | | [2037-06-11 thu] | 17.11 | | [2037-06-24 wed] | 24.21 | | [2037-07-07 tue] | 82.38 | | [2037-07-20 mon] | 39.94 | | [2037-08-02 sun] | 81.90 | | [2037-08-15 sat] | 71.67 | | [2037-08-28 fri] | 82.81 | | [2037-09-10 thu] | 42.50 | | [2037-09-23 wed] | 62.52 | | [2037-10-06 tue] | 5.13 | #+end_example
We would like to specify the aggregation over =month(Date)=. But only plain columns may be used as aggregating keys.
One way is to add a =Month= column to the input table. The modified table looks like:
#+begin_example #+name: with-months | Date | Quantity | Month | |------------------+----------+-------| | [2037-03-12 thu] | 56.93 | 3 | | [2037-03-25 wed] | 20.99 | 3 | | [2037-04-07 tue] | 80.81 | 4 | | [2037-04-20 mon] | 22.26 | 4 | | [2037-05-03 sun] | 69.75 | 5 | | [2037-05-16 sat] | 39.91 | 5 | | [2037-05-29 fri] | 93.13 | 5 | | [2037-06-11 thu] | 17.11 | 6 | | [2037-06-24 wed] | 24.21 | 6 | | [2037-07-07 tue] | 82.38 | 7 | | [2037-07-20 mon] | 39.94 | 7 | | [2037-08-02 sun] | 81.90 | 8 | | [2037-08-15 sat] | 71.67 | 8 | | [2037-08-28 fri] | 82.81 | 8 | | [2037-09-10 thu] | 42.50 | 9 | | [2037-09-23 wed] | 62.52 | 9 | | [2037-10-06 tue] | 5.13 | 10 | #+TBLFM: $3=month($1) #+end_example
OrgAggregate allows adding input columns like this computed =Month= column, without modifying the input table. The =:precompute= parameter does that. Example:
#+begin_example #+BEGIN: aggregate :table "without-months" :cols "Month vsum(Quantity)" :precompute ("month(Date);'Month'") | Month | vsum(Quantity) | |-------+----------------| | 3 | 77.92 | | 4 | 103.07 | | 5 | 202.79 | | 6 | 41.32 | | 7 | 122.32 | | 8 | 236.38 | | 9 | 105.02 | | 10 | 5.13 | #+END: #+end_example
The specification =month(Date);'Month'= means:
- add a third column to the input table,
- fill it with the formula =month(Date)=, which is a Calc formula,
- give this new column the =Month= title,
- make this new column available for aggregation, as any other column. All this process is virtual. The input table is not modified in any way.
If the title =Month= is not specified, then the new virtual column will be referred to as =$3=.
Note that here the title was specified with single quotes. This is required to disambiguate with the format. The syntax is consistent with the one used in the =:cols= parameter and the one used by Org Mode spreadsheet formulas.
The pre-computations may also be Lisp expressions, exactly like in the usual Org table spreadsheet. In this example, we want to aggregate on coarse bins. Bins are just hundredths of the first column:
#+begin_example #+name: want-bins | 109 | 41.24 | | 140 | 40.60 | | 174 | 7.68 | | 288 | 33.96 | | 334 | 21.42 | | 418 | 74.73 | | 455 | 79.62 | | 479 | 22.23 | | 554 | 28.03 | | 678 | 64.12 | | 797 | 70.91 | | 947 | 93.48 | #+end_example
#+begin_example #+BEGIN: aggregate :table "want-bins" :cols "$3 vmean($2)" :precompute ("'(floor (/ (string-to-number $1) 100))") | $3 | vmean($2) | |----+-----------| | 1 | 29.84 | | 2 | 33.96 | | 3 | 21.42 | | 4 | 58.86 | | 5 | 28.03 | | 6 | 64.12 | | 7 | 70.91 | | 9 | 93.48 | #+END: #+end_example
Virtual columns may be formatted as any other column, with the same syntax as in =:cols= or in the Org table spreadsheet. For example here we give it 2 digits after dot:
#+begin_example #+BEGIN: aggregate :table "want-bins" :cols "$3" :precompute "floor($1/100);%.2f"
| $3 |
|---|
| 1.00 |
| 2.00 |
| 3.00 |
| 4.00 |
| 5.00 |
| 6.00 |
| 7.00 |
| 9.00 |
| #+END: |
| #+end_example |
Of course, those additional virtual input columns may be used for other purposes than key columns. They may enter in aggregating formulas. Or they may be used by the optional row filter (the =:cond= parameter). There is no difference between actual and virtual columns.
The =:precompute= parameter may be:
a list of strings, example: : ("month(Date);'Month'" "day(Date);'Day'")
a single string with fields separated by =::=, like in the =#+tblfm:= tags of a spreadsheet. Example: : "month(Date);'Month' :: day(Date);'Day'"
a string containing a single formula (actually this is a special case of the previous one). Example: : "month(Date);'Month'"
- Post-processing :PROPERTIES: :CUSTOM_ID: post-processing :END:
After OrgAggregate has generated the output table, it can be further processed:
- additional columns may be added with the standard Org Mode spreadsheet formulas.
- any algorithm in an exotic language (Python, R, C++, Emacs Lisp, and so on) can be applied to the output.
** Spreadsheet formulas :PROPERTIES: :CUSTOM_ID: spreadsheet-formulas :END:
Additional columns can be specified for the resulting table. With a previous example, adding a =:formula= parameter, we specify a new column =$4= which uses the aggregated columns. It is translated into a usual =#+TBLFM:= spreadsheet line.
#+begin_example #+BEGIN: aggregate :table original :cols "Day vmean(Level) vsum(Quantity)" :formula "$4=$2*$3" | Day | vmean(Level) | vsum(Quantity) | | |-----------+--------------+----------------+------| | Monday | 27.5 | 14 | 385. | | Tuesday | 43 | 45 | 1935 | | Wednesday | 18 | 54 | 972 | | Thursday | 43 | 83 | 3569 | | Friday | 8 | 22 | 176 | #+TBLFM: $4=$2*$3 #+END: #+end_example
Moreover, if a =#+TBLFM:= was already there, it survives aggregation re-computations.
This happens in /pull mode/ only.
** Algorithm post processing :PROPERTIES: :CUSTOM_ID: algorithm-post-processing :END:
The aggregated table can be post-processed with the =:post= parameter. It accepts a Lisp =lambda=, a Lisp function, or a Babel block in any exotic language (R, Python, C++, Emacs Lisp and so on).
The process receives the aggregated table as parameter in the form of a Lisp expression. It can process it in any way it wants, provided it returns a valid Lisp table.
A Lisp table is a list of rows. Each row is either a list of cells, or the special symbol =hline=.
In this example, a =lambda= expression adds a =hline= and a row for /Sunday/.
#+begin_example #+BEGIN: aggregate :table original :cols "Day vsum(Quantity)" :post (lambda (table) (append table '(hline (Sunday "0.0")))) | Day | vsum(Quantity) | |-----------+----------------| | Monday | 14 | | Tuesday | 45 | | Wednesday | 54 | | Thursday | 83 | | Friday | 22 | |-----------+----------------| | Sunday | 0.0 | #+END: #+end_example
The =lambda= can be moved to a =defun=. The function is then passed to the =:post= parameter:
#+begin_example ,#+begin_src elisp (defun my-function (table) (append table '(hline (Sunday "0.0")))) ,#+end_src
... :post my-function #+end_example
The =:post= parameter can also refer to a Babel Block. Example:
#+begin_example #+BEGIN: aggregate :table original :cols "Day vsum(Quantity)" :post "my-babel-block(tbl=this)" ... #+END: #+end_example
#+begin_example ,#+name: my-babel-block ,#+begin_src elisp :var tbl="" (append tbl '(hline (Sunday "0.0"))) ,#+end_src #+end_example
Beware! You may want to add =:colnames t= to your Babel block. Otherwise the table's header will be lost.
** Grand total :PROPERTIES: :CUSTOM_ID: grand-total :END:
She (the user) may be tempted to add the grand total at the bottom of the aggregation. Example of such a temptation:
#+begin_example ,#+begin: aggregate :table original :cols "Day vmean(Level) vsum(Quantity)" | Day | vmean(Level) | vsum(Quantity) | |-----------+--------------+----------------| | Monday | 27.5 | 14 | | Tuesday | 43 | 45 | | Wednesday | 18 | 54 | | Thursday | 43 | 83 | | Friday | 8 | 22 | |-----------+--------------+----------------| | Total | | 218 | ,#+TBLFM: @7$3=vsum(@I..@II) ,#+end #+end_example
With OrgAggreagate post-processing, it is easy.
- Just put in place a small algorithm to add two lines.
#+begin_example :post (append this '(hline ("Total" "" ""))) ▲ ▲ ▲ ╰─────╮ │ │ the aggregated table╶─╯ │ │ one horizontal line╶─────╯ │ an empty cell to recieve the total╶──────╯ #+end_example
The =this= Lisp variable contains the aggregated table, just while the post-processing takes place. The =append= Lisp function adds two rows to the aggregated table, and returns the amended table.
Note that the =:post= parameter may be:
- a small Lisp expression, as in this example,
- a lambda expression, which parameter is the aggregated table,
- the name of a Lisp function, which is passed the aggregate table,
- the name of a Babel block, written in any supported language.
- Fill the additional cell with a formula for the total.
#+begin_example @>$2=vsum(@I..@II) ▲ ▲ ▲ ▲ ▲ │ │ │ │ ╰──────────────╮ │ │ │ ╰────────────────╮ │ │ │ ╰─────────────────╮ │ │ │ ╰────────────╮ │ │ │ ╰──────────╮ │ │ │ │ last line╶─╯ │ │ │ │ second column╶─╯ │ │ │ sum all values between╶─╯ │ │ the first horizontal line╶─╯ │ and the second one╶──────────╯ #+end_example
In this way, the grand-total will be recomputed each time the aggregation is refreshed (=C-c C-c=). Note the use of =@>$2= for the coordinates of the cell receiving the total, instead of, for instance =@7$3=. This ensures that the formula will continue to be applied on the last row, even if the aggregated table grows later on. The same idea applies for the =@I..@II= range, instead of, for instance =@2..@6=.
Even though OrgAggregate offers the user a versatil post-processing to add a grand total, there are many reasons not to. If she does, she is quietly entering the same nightmare which plagues spreadsheets. Everything will become harder and harder to maintain.
It seems natural to add a grand total right below the column. But suppose that now she also want the standard deviation of this same column. Where to put it? There is a blank cell just left of the grand total. She puts the standard deviation there:
#+begin_example #+begin: aggregate :table original :cols "Day vmean(Level) vsum(Quantity)" | Day | vmean(Level) | vsum(Quantity) | |-----------+--------------+----------------| | Monday | 27.5 | 14 | | Tuesday | 43 | 45 | | Wednesday | 18 | 54 | | Thursday | 43 | 83 | | Friday | 8 | 22 | |-----------+--------------+----------------| | Total | 27.409852 | 218 | #+TBLFM: @7$3=vsum(@I..@II)::@7$2=vsdev(@I$3..@II$3) #+end #+end_example
But then this =27.409852= looks like the grand total of the second column, but it is not.
Later on, she may want to further process this aggregated table, for example to plot it. The grand total row will be an annoyance. It will be tedious to get ride of it.
She should instead consider creating a second aggregation with the grand total:
#+begin_example ,#+begin: aggregate :table original :cols "vsdev(Level) vsum(Quantity)" | vsdev(Level) | vsum(Quantity) | |--------------+----------------| | 27.409852 | 218 | ,#+end #+end_example
Easy, maintainable, no awkward decisions to remember and document. She keeps it simple.
** Chaining :PROPERTIES: :CUSTOM_ID: chaining :END:
The result of an aggregation may become the source of further processing. To do that, just add a =#+NAME:= or =#+TBLNAME:= line just above the aggregated table. Here is an example of a double aggregation:
#+begin_example #+NAME: squantity #+BEGIN: aggregate :table original :cols "Day vsum(Quantity)" | Day | SQuantity | |-----------+-----------| | Monday | 14 | | Tuesday | 45 | | Wednesday | 54 | | Thursday | 83 | | Friday | 22 | #+TBLFM: @1$2=SQuantity #+END:
#+BEGIN: aggregate :table "squantity" :cols "vsum(SQuantity)"
| vsum(SQuantity) |
|---|
| 218 |
| #+END: |
| #+end_example |
Note the spreadsheet cell formula =@1$2=SQuantity=, which changes the column heading from it default =vsum(Quantity)= to =SQuantity=. This new heading will survive any refresh.
Sometimes the name of the aggregated table is not found by some babel block referencing it (Gnuplot blocks are among them). To fix that, just exchange the =#+NAME:= and =#+BEGIN:= lines:
#+begin_example #+BEGIN: aggregate :table original :cols "Day vsum(Quantity)" #+NAME: squantity | Day | SQuantity | |-----------+-----------| | Monday | 14 | | Tuesday | 45 | | Wednesday | 54 | | Thursday | 83 | | Friday | 22 | #+TBLFM: @1$2=SQuantity #+END: #+end_example
The =#.NAME:= line will survive when recomputing the aggregation (as =#.TBLFM:= line survives)
- Pull & Push :PROPERTIES: :CUSTOM_ID: pull--push :END:
Two modes are available: /pull/ & /push/.
** Pull mode :PROPERTIES: :CUSTOM_ID: pull-mode :END:
In the /pull/ mode, we use so called /"dynamic blocks"/. The resulting table knows how to build itself.
Example:
We have a source table which is unaware that it will be derived in an aggregated table:
#+begin_example #+NAME: source1 | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+end_example
We create somewhere else a /dynamic block/ which carries the specification of the aggregation:
#+begin_example #+BEGIN: aggregate :table "source1" :cols "Day vmean(Level) vsum(Quantity)" | Day | vmean(Level) | vsum(Quantity) | |-----------+--------------+----------------| | Monday | 27.5 | 14 | | Tuesday | 43 | 45 | | Wednesday | 18 | 54 | | Thursday | 43 | 83 | | Friday | 8 | 22 | #+END #+end_example
Typing =C-c C-c= in the dynamic block recomputes it freshly.
** Push mode :PROPERTIES: :CUSTOM_ID: push-mode :END:
In /push/ mode, the source table drives the creation of derived tables. We specify the wanted results in =#+ORGTBL: SEND= directives (as many as desired):
#+begin_example #+ORGTBL: SEND derived1 orgtbl-to-aggregated-table :cols "vmean(Level) vsum(Quantity)" #+ORGTBL: SEND derived2 orgtbl-to-aggregated-table :cols "Day vmean(Level) vsum(Quantity)" | Day | Color | Level | Quantity | |-----------+-------+-------+----------| | Monday | Red | 30 | 11 | | Monday | Blue | 25 | 3 | | Tuesday | Red | 51 | 12 | | Tuesday | Red | 45 | 15 | | Tuesday | Blue | 33 | 18 | | Wednesday | Red | 27 | 23 | | Wednesday | Blue | 12 | 16 | | Wednesday | Blue | 15 | 15 | | Thursday | Red | 39 | 24 | | Thursday | Red | 41 | 29 | | Thursday | Red | 49 | 30 | | Friday | Blue | 7 | 5 | | Friday | Blue | 6 | 8 | | Friday | Blue | 11 | 9 | #+end_example
We must create the receiving blocks somewhere else in the same file:
#+begin_example #+BEGIN RECEIVE ORGTBL derived1 #+END RECEIVE ORGTBL derived1 #+end_example
#+begin_example #+BEGIN RECEIVE ORGTBL derived2 #+END RECEIVE ORGTBL derived2 #+end_example
Then we come back to the source table and type =C-c C-c= with the cursor on the 1st pipe of the table, to refresh the derived tables:
#+begin_example #+BEGIN RECEIVE ORGTBL derived1 | vmean(Level) | vsum(Quantity) | |---------------+----------------| | 27.9285714286 | 218 | #+END RECEIVE ORGTBL derived1 #+end_example
#+begin_example #+BEGIN RECEIVE ORGTBL derived2 | Day | vmean(Level) | vsum(Quantity) | |-----------+--------------+----------------| | Monday | 27.5 | 14 | | Tuesday | 43 | 45 | | Wednesday | 18 | 54 | | Thursday | 43 | 83 | | Friday | 8 | 22 | #+END RECEIVE ORGTBL derived2 #+end_example
** Pull or push ? :PROPERTIES: :CUSTOM_ID: pull-or-push- :END:
Pull & push modes use the same engine in the background. Thus, using either is just a matter of convenience.
Pull mode is the most straightforward. Also the wizard operates on the pull mode only. Almost all examples in this documentation are in pull mode. If you cannot decide, just use the pull mode.
Glitch: in push mode you may see strange output like =_{}=. This is an escape generated by Org Mode (nothing to do with OrgAggregate). It happens for the following characters: =&%#_^= To disable that, in the =#+ORGTBL: SEND= line, add this parameter: =:no-escape true=
- Debugging :PROPERTIES: :CUSTOM_ID: debugging :END: The work of OrgAggregate is to hand out pieces of the input table to Calc (the Emacs calculator).
Is some intricate cases, it may be useful to see what is going on. The debugging formatters come handy.
** Seeing the $ forms :PROPERTIES: :CUSTOM_ID: seeing-the--forms :END:
Here is an example input table:
#+begin_example #+name: inputdebug | nn | aa | |------+----| | 1.23 | a | | 7.65 | b | | 8.46 | c | |------+----| | 2.44 | d | | 6.81 | e | #+end_example
And here is an aggregation to debug:
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn10) vsum(aa+7)" | hline | vsum(nn10) | vsum(aa+7) | |-------+-------------+----------------| | 0 | 173.4 | a + b + c + 21 | | 1 | 92.5 | d + e + 14 | #+END: #+end_example
So far so good. But we would like to know what Calc did. To do so let us add the =c= formatter.
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn10);c vsum(aa+7);c" | hline | vsum(nn10) | vsum(aa+7) | |-------+-------------+------------| | 0 | vsum($110) | vsum($2+7) | | 1 | vsum($110) | vsum($2+7) | #+END: #+end_example
Each output cell now contains the formula, with column names replaced by dollar equivalent forms. But without any further processing.
** Seeing Calc formulas before evaluation :PROPERTIES: :CUSTOM_ID: seeing-calc-formulas-before-evaluation :END:
Let us go one step further with the =C= formatter:
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn10);C vsum(aa+7);C" | hline | vsum(nn10) | vsum(aa+7) | |-------+-----------------------------+---------------------| | 0 | vsum([1.23, 7.65, 8.46] 10) | vsum([a, b, c] + 7) | | 1 | vsum([2.44, 6.81] 10) | vsum([d, e] + 7) | #+END: #+end_example
The dollar forms were replaced by Calc vectors made of input cells. No foldings or simplifications went on. The vectors are slices of columns, selected by OrgAggregate in response of the =hline= aggregation.
We see that multiplying by =10= or adding =7= is done on a Calc vector. It happens that Calc knows how to multiply or add something to a vector. OrgAggregate does not perform those operations, it delegates them to Calc.
** Seeing Lisp internal form of Calc formulas :PROPERTIES: :CUSTOM_ID: seeing-lisp-internal-form-of-calc-formulas :END:
We can also view the same results, formatted as Lisp forms (rather than Calc forms) with the =Q= formatter:
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn10);Q vsum(aa+7);Q" | hline | vsum(nn10) | vsum(aa+7) | |-------+---------------------------------------------------------------------------+-----------------------------------------------------------------------| | 0 | (calcFunc-vsum (* (vec (float 123 -2) (float 765 -2) (float 846 -2)) 10)) | (calcFunc-vsum (+ (vec (var a var-a) (var b var-b) (var c var-c)) 7)) | | 1 | (calcFunc-vsum (* (vec (float 244 -2) (float 681 -2)) 10)) | (calcFunc-vsum (+ (vec (var d var-d) (var e var-e)) 7)) | #+END: #+end_example
This is the internal, Lisp representation of Calc formulas.
** Example of debugging vsum(nn^2) :PROPERTIES: :CUSTOM_ID: example-of-debugging-vsumnn2 :END:
Beware of a formula like =vsum(nn^2)=. This gives the expected result, although not in the obvious way. Let us see what happens, thanks to the =C= debugging formatter:
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn^2);C" | hline | vsum(nn^2) | |-------+----------------------------| | 0 | vsum([1.23, 7.65, 8.46]^2) | | 1 | vsum([2.44, 6.81]^2) | #+END: #+end_example
We are not summing squares. We are squaring Calc vectors. Calc being a mathematical tool, it interprets the product of two vectors as the sum of the products element-wise, as a mathematician would do. Then =vsum= is applied on a single resulting value. So =vsum= is useless in this case. That can be confirmed:
#+begin_example #+BEGIN: aggregate :table "inputdebug" :cols "hline vsum(nn^2) nn^2 vprod(nn^2)" | hline | vsum(nn^2) | nn^2 | vprod(nn^2) | |-------+------------+---------+-------------| | 0 | 131.607 | 131.607 | 131.607 | | 1 | 52.3297 | 52.3297 | 52.3297 | #+END: #+end_example
Therefore, changing =vsum= to =vprod= does not change the result. This can be unexpected.
** Summary of debugging formatters :PROPERTIES: :CUSTOM_ID: summary-of-debugging-formatters :END:
To summarize the debugging settings:
- =c=: output Calc formula
- =C=: output Calc formula with dollar forms substituted by actual input data
- =q=: output Lisp formula
- =Q=: output Lisp formula with column forms substituted by actual input data
- Tricks :PROPERTIES: :CUSTOM_ID: tricks :END: This chapter collects some tricks that may be useful.
** Sorting :PROPERTIES: :CUSTOM_ID: sorting-0 :END: #+begin_example #+name: trick_table_1
| column |
|---|
| 677 |
| 713 |
| 459 |
| 537 |
| 881 |
| #+end_example |
When several cells of a column need to be sorted, the Calc =calc-sort()= function is handy:
#+begin_example #+BEGIN: aggregate :table "trick_table_1" :cols "(column) sort(column)" | (column) | sort(column) | |---------------------------+---------------------------| | [677, 713, 459, 537, 881] | [459, 537, 677, 713, 881] | #+END: #+end_example
- =(column)= gives the list of values to aggregate, without aggregating them.
- =sort(column)= gives the same list sorted in ascending order.
** A few lowest or highest values :PROPERTIES: :CUSTOM_ID: a-few-lowest-or-highest-values :END:
Used with =subvec()=, =sort()= can retrieve the two lowest or the two highest values:
#+begin_example #+BEGIN: aggregate :table "trick_table_1" :cols "subvec(sort(column),1,3) subvec(sort(column),count()-1)" | subvec(sort(column),1,3) | subvec(sort(column),count()-1) | |--------------------------+--------------------------------| | [459, 537] | [713, 881] | #+END: #+end_example
- =subvec(...,1,3)= extracts the two first values: from =1= to =3= excluded.
- =subvec(...,count()-1)= extracts the two last values, numbered =count()-1= and =count()=
And of course we may retrieve the average of the two first and the two last values:
#+begin_example #+BEGIN: aggregate :table "trick_table_1" :cols "vmean(subvec(sort(column),1,3)) vmean(subvec(sort(column),count()-1))" | vmean(subvec(sort(column),1,3)) | vmean(subvec(sort(column),count()-1)) | |---------------------------------+---------------------------------------| | 498 | 797 | #+END: #+end_example
** Span of values :PROPERTIES: :CUSTOM_ID: span-of-values :END:
=vmin()= and =vmax()= can compute the span of aggregated values:
#+begin_example #+BEGIN: aggregate :table "trick_table_1" :cols "vmin(column) vmax(column) vmax(column)-vmin(column)" | vmin(column) | vmax(column) | vmax(column)-vmin(column) | |--------------+--------------+---------------------------| | 459 | 881 | 422 | #+END: #+end_example
** No aggregation :PROPERTIES: :CUSTOM_ID: no-aggregation :END: Why would one want to use OrgAggregate while not aggregating? To benefit from the other features of OrgAggregate:
- column rearrangement
- sorting
- formatting
- =#+TBLFM= survival
- row filtering
- preprocess
- postprocess
To do so, mention the virtual column =@#= in =:cols= and make it invisible with =;<>=. As =@#= is different for each row, the aggregation will consider each row as a separate group. Therefore, no aggregation on another column will do anything more.
For example, here we:
- put =Color= as the first column (it is the second in the input),
- ignore the =Day= column,
- sort by =Level=,
- compute =Quantity/7=,
- format it with 2 digits after dot.
#+begin_example #+BEGIN: aggregate :table "original" :cols "@#;<> Color Level;^n vmax(Quantity/7);'Q10';f2" | Color | Level | Q10 | |-------+-------+------| | Blue | 6 | 1.14 | | Blue | 7 | 0.71 | | Blue | 11 | 1.29 | | Blue | 12 | 2.29 | | Blue | 15 | 2.14 | | Blue | 25 | 0.43 | | Red | 27 | 3.29 | | Red | 30 | 1.57 | | Blue | 33 | 2.57 | | Red | 39 | 3.43 | | Red | 41 | 4.14 | | Red | 45 | 2.14 | | Red | 49 | 4.29 | | Red | 51 | 1.71 | #+END: #+end_example
We used the =vmax()= aggregating function on =Quantity/7=, because otherwise we would get a vector with a single value. As there is a single value, any aggregating function will do the trick: =vmin()=, =head()=, =rtail()=, =vsum()=, =vprod()=, =vmean()=, =vgmean()=, =vhmean()=, =vspan()=, =vmedian()=.
- Installation :PROPERTIES: :CUSTOM_ID: installation :END:
Emacs package on Melpa: add the following lines to your =.emacs= file, and reload it.
#+begin_example (add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t) (package-initialize) #+end_example
You may also customize this variable: #+begin_example M-x customize-variable package-archives #+end_example
Then browse the list of available packages and install =orgtbl-aggregate= #+begin_example M-x package-list-packages #+end_example
Alternatively, you can download the lisp file, and load it:
#+begin_example (load-file "orgtbl-aggregate.el") #+end_example
- Authors, contributors :PROPERTIES: :CUSTOM_ID: authors-contributors :END:
Authors
- Thierry Banel, tbanelwebmin at free dot fr, inception & implementation.
- Michael Brand, Calc unleashed, =#+TBLFM= survival, empty input cells, formatters.
Contributors
- Eric Abrahamsen, non-ASCII column names
- Alejandro Erickson, quoting non alphanumeric column names
- Uwe Brauer, simpler example in documentation, take org-calc-default-modes preferences into account
- Peking Duck, fixed obsolete letf function
- Bill Hunker, discovered =_{}= escape
- Dirk Schmitt, surviving =#.NAME:= line
- Dale Sedivec, case insensitive =#+NAME:= tags
- falloutphil, underscore in column names
- Baudilio Tejerina, t, T, U formatters
- Marco Pas, bug comparing empty string
- wuqui, sorting output table, filtering only
- Nicolas Viviani, output hlines
- Nils Lehmann, support old versions of the rx library
- Shankar Rao, =:post= post-processing
- Misohena (https://misohena.jp/blog/author/misohena), double width Japanese characters (string-width vs. length)
- Kevin Brubeck Unhammer, ignore formatting cookies
- Tilmann Singer, more flexibility in duration format
- Piotr Panasiuk, =#+CAPTION:= and any tags survive
- Luis Miguel Hernanz, fix regex bug
- Jason Hemann, output column names no longer have quotes
- Tilmann Singer, computed aggregating bins, ="month(Date)"= in his use case
- Changes :PROPERTIES: :CUSTOM_ID: changes :END: Top: earliest change. Bottom: latest change.
- Wizard now correctly asks for columns with =$1, $2...= names when table header is missing
- Handle tables beginning with hlines
- Handle non-ASCII column names
- =:formula= parameter and =#+TBLFM= survival
- Empty cells are ignored.
- Empty output upon too small input set
- Fix ordering of output values
- Aggregations formulas may now be arbitrary expressions
- Table headers (and the lack of) are better handled
- Modifiers and formatters can now be specified as in the spreadsheet
- Aggregation function names can optionally have a leading =v=, like =sum= & =vsum=
- Increased performance on large data sets
- Tables can be named with =#+NAME:= besides =#+TBLNAME:=
- Document Melpa installation
- Support quoting of column names, like "a.b" or 'c/d'
- Disable =_{}= escape
- =#+NAME:= inside =#+BEGIN:= survives
- Missing input cells handled as empty ones
- Back-port Org Mode =9.4= speed up
- Increase performance when inserting result into the buffer
- Aligned output in push mode
- Added a hash-table to speedup aggregation
- Back-port org-table-to-lisp which is now much faster
- =vlist(X)= now yields input cells verbatim were =(X)= yields Calc processed input cells
- Document dates handling and the =date()= function
- Implement =HH:MM:SS= durations and =T=, =t=, =U= formatters
- Sort output
- Create hlines in the output
- Missing :cond parameter means all columns
- Remove =C-c C-x i=, use standard =C-c C-x x= instead
- Avoid name collision between Calc functions and columns
- More readable & faster code
- Support for old versions of the rx library
- =:post= post-processing
- Propagate multiple rows source header to the aggregated header
- Ignore data rows containing formatting cookies
- Follow Org Mode way of handling Calc settings in Lisp code
- Hours in durations are no longer restricted to 2 digits
- 3x speedup =org-table-to-lisp= and avoid Emacs 27 to 30 incompatibilities
- =#+CAPTION:= and any other tag survive inside =#+BEGIN:=
- Output column names are now stripped from quotes, better reflecting input names.
- Table-of-contents in README.org (thanks org-make-toc)
- Add formatters =c= =C= =q= =Q= (useful for debugging or understanding OrgAggregate)
- Formulas involving =hline= like =vmean(hline*10)= are now taken into account
- Documentation is now integrated right into Emacs in the =info= format. Type =M-: (info "orgtbl-aggregate")=
- Input table may now be the result of a Babel script (virtual table).
- Better handling of user errors in the =:post= directive.
- Speedup of resulting table recalculation when there are formulas in =#+tblfm:= or in =:formula=. The overall aggregation may be up to x6 faster and ÷5 less memory hungry.
- Circumvent an Org Mode bug in case there are a column-formula along with a cell-formula, the cell-one not being calculated. (Bonus: 15% speedup).
- Fix issue #24: bug in date parsing.
- Virtual pre-computed input columns.
- Better explanation of the input table reference syntax, including distant tables and virtual table produced by Babel blocks.
- Support for CSV and JSON formatted input tables.
- New =@#= virtual column giving the number of each row, pretty much like the Org table spreadsheet =@#= virtual column.
- Header and column names can be specified for CSV input tables, as well as horizontal separators (=hline=).
- Aggregation of the titles of this README.
- New free-form wizard.
- Illustrate README with Uniline graphics.
- JSON and CSV input tables can now live inside Org Mode blocks.
- Example for computing a refreshable grand total.
- Document intervals and error-forms handling.
- Special columns =@#= and =hline= are handled by transpose.
- GPL 3 License :PROPERTIES: :CUSTOM_ID: gpl-3-license :END: Copyright (C) 2013-2026 Thierry Banel
orgtbl-aggregate is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
orgtbl-aggregate is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.