Pyspread – The Spreadsheet with a Python connection

Paul Tessier, 123RF

Paul Tessier, 123RF

Scales & Cells

You can find plenty of spreadsheets offering a multitude of features. The Python-based Pyspread demonstrates that a sophisticated spreadsheet application can also be newcomer-friendly.

Since the early days of electronic data processing, the spreadsheet application has been a standard tool for everyday office life. The focus of this article, Pyspread [1], is related directly to some of the larger, better-known representatives, such as Gnumeric, KSpread/Calligra Sheets, Calc from Libre- and OpenOffice, and PlanMaker from the commercial SoftMaker Office package.

Pyspread, however, has a unique selling point among this gaggle of competitors: It is uncluttered and manageable for beginners (Figure 1). It includes all the basic mechanisms required for table-based work and performing calculations. It also impresses with the fact that it uses an existing scripting language for formulas and evaluations within the data sheet instead of its own macro language. Pyspread thus follows a similar approach to Calc; however, it is based on the popular Python instead of Scheme.

Figure 1: A classic application: travel expenses with a grand total.

Pyspread, which itself is written in Python, groups a wider choice of language constructs and notations with special importance to spreadsheets. This means that the interpreter can fully process calculation formulas within a reasonable time, and it visualizes the results as values or graphics.

You must explicitly enable the scripting content and cryptographically safeguard it with a GnuPG key. The latter eliminates the risk of unintended execution of Python-based malicious code that may be disguised as a harmless formula in a data sheet.

Publishing and Development Goal

Pyspread is licensed under the GPLv3 and is available in the software archives of all major distributions. If you are not happy with the current stable version, 0.3.3 (Debian  8, Ubuntu 15.10), you can download version 1.0 from the project website or opt for the developer version [2]. Version 0.3 is sufficient for the first steps, as is version 0.4. Version 1.0 only has a couple more buttons, such as for direct PDF export.

Pyspread is quite newcomer-friendly, because of its graphical user interface based on Gnome/GTK. But, when you use it, you will notice that the developers are also catering to the preferences of engineers and the way they think. The team led by Martin Manns is committed to the goal of creating the "most pythonic spreadsheet."

Each cell returns a single Python object that can be referenced by cells. These objects can include Python expressions, strings, integers, floating-point numbers, lists, and matrices. You can dock the NumPy library [3] for vectors and matrixes as well as Matplotlib [4] for graphical representations. And, if you have the skills, you can integrate your own acquired knowledge of Python directly.

Program Structure

The arrangement of the components in Pyspread follows the classic layout for spreadsheets. At the center is the data sheet with individual rows and column. Above that, you'll find a combined menu and toolbar along with an input line. You change the cell contents directly or via the input line.

The individual buttons allow direct access to file operations, text and table cell formatting (font, color, size, background, outline), and rotation and alignment of cell contents. Additionally, you can sort the cells and search and replace, in a non-case-sensitive manner, using regular expressions.

When it comes to exchanging data, Pyspread can export data as comma-separated values (CSV) and – as of version 1.x – SVG and PDF files. It can also import and export the older Microsoft Excel format thanks to the xlrd /xlwt library [5]. Matplotlib is used for displaying charts and for data visualization; image data integration comes courtesy of the wxPython framework library [6].

Figure 1 shows an excerpt from an electronic travel expense book with five columns: Trip for the number of times the distance was traveled, From for the start location, To for the destination, as well as the individual and total distances. Underneath, you can see the grand total of the accumulated kilometers. The bar chart to the right shows the relationships of the distances. If you modify the values in the underlying table cells, Pyspread recomputes the formulas and updates the view in the results cells and the graphics.

Formulas and References to Individual Cells

A three-dimensional reference vector S serves as the basis of all calculations in Pyspread. It consists of the values for each line and column and the number of the data sheet. The comma is used as a separator between the values, and all indexes start counting from zero.

Both absolute and relative references to cells can be made. For relative references, the available variables are X , Y , and Z . They include the current row (X ) and column (Y ), as well as the number of the data sheet (Z ). You can address the cell in the next row of the same data sheet, with the notation S[X+1, Y, Z] .

The travel expense example uses absolute references. For example, to determine the number of kilometers for the first trip, row 1, column 4 has the formula S[1,0,0] * S[1,3,0] . This computes the kilometers for the distance of a single trip with absolute references to a cell. The sum total of all distances is then the sum of several cells in a column. Valid formulas for this include:

  • Variant 1: sum([S[r,4,0] for r in range(1,6)])
  • Variant 2: sum(S[1:6,4,0])

Variant 1 follows a functional programming approach and runs the sum function against column 4 in combination with a for loop, which iterates against the variable r in a value range from 1 to 6. This reads the contents of the cells in rows 1 to 6. The significantly more compact variant 2 uses the typical Python notation for a range: Rows 1 to 6 are entered as colon-separated numbers in the sum function. The results computed are identical to variant 1 and is often easier for non-programmers to read.

However, sum() is not the only feature from the Python universe that you can use. Table 1 provides a brief overview of the possibilities. You can embed features from other Python modules via an assignment as follows:

Variable = _ _import_ _("Module")

After this assignment, the appropriate function from Module is available for all the cells in your data sheet.

Table 1

Pyspread Standard Functions (Selection)

Function Result
abs() Absolute value
all() true , if no element of a list is empty
any() true , if at least one element of a list is empty
max() Highest value in a list
min() Lowest value in a list
pow() Power of a value
round() Rounded numeric value
sum() Sum of a list

Creating & Embedding Charts

Charts visualize the relationships between numbers. Pyspread hides its chart features behind an unassuming button with the two colored curves. To create graphics, Pyspread relies on the Matplotlib function library, as mentioned earlier.

Pressing the button opens a dialog box that lets you design the chart (Figure 2) – on the left are the settings for labels and scales, in the middle are those for chart data (values), and on the right is a preview of the resulting graph. However, this dialog box was designed for very large screens. Although you can enlarge the box and dynamically adjust the column widths to the size of the window, you will still see hardly anything on a smaller screen. This is particularly true of the middle column in which you (would want to) enter the data range for the chart. First, you need to access the fields, and I only managed to do so in testing by massively enlarging the window.

Figure 2: The dialog box for creating graphs.

The conclusion for the Charts function is thus "somewhat inconvenient." I finally achieved some results, but only after some stoic exploration, because the documentation doesn't help here – there is simply no information on this feature, and the contextual help for many input fields is unusable.

Aside from the idiosyncratic column widths, it is hard to tell what to enter in the input fields for the chart – single values or lists – and once you know what to enter, you still need to find out where it belongs. Additionally, the format of the entries is problematic. The range notation mentioned earlier does not work here; what does work is a bunch of individual values with a notation of (<value1>, <value2>) , etc. I managed to reference a single cell using the reference vector S , as in S[1,2,3] for the second row, third column in the fourth data sheet.

Saving Documents

When you reopen a previously saved Pyspread document, the application does not automatically interpret the formulas that the document contains. The reasoning behind this is a security mechanism that is designed to prevent the running of potentially malicious code – any Python snippet could in principle do harm to the system.

Pyspread cannot itself identify the side effects of the program code in the data sheet – particularly not for external modules. If you are sure that the Python snippets included do no harm, you can select File | Approve .

To automate this step in the future, Pyspread asks you for your GnuPG key [7], which it uses to sign Pyspread files. As a user, you can now use the GnuPG key to check whether the document is from a trusted source.

Listing 1 shows the thumbprint of my own GnuPG key [8]. Pyspread stores this fingerprint in its settings (Figure 3). The corresponding dialog is found in the menu below File | Preferences ; the entry for the fingerprint is the bottom input box.

Listing 1

Fingerprint

$ gpg --fingerprint D431AC07
pub   4096R/D431AC07 2014-09-05
  Key fingerprint = 35F8 DF9C 884E 36AB 9744  60AF CFA7 2978 D431 AC07
uid                  Frank Hofmann (Hofmann EDV) <frank.hofmann@efho.de>
sub   4096R/3B074F29 2014-09-05
Figure 3: Dialog box with the stored GnuPG key fingerprint.

Conclusions

The idea of elegantly combining a spreadsheet with an existing, mature programming language is smart. The same applies to the use of GnuPG signatures to boost confidence in the submitted documents. This is a combination that the major league players currently do not offer.

Pyspread has mature basic functions that make working with the program convenient. The major construction sites are the charts and documentation – there is ample scope for improvement here. But, there is good reason to keep an eye on Pyspread's development in the future.

Acknowledgements

I thank Martin Manns, the creator of Pyspread, for his help while writing this article.

Infos

  1. Pyspread: http://manns.github.io/pyspread/
  2. Ubuntu Pyspread package: http://packages.ubuntu.com/xenial/pyspread
  3. NumPy: http://www.numpy.org
  4. Matplotlib: http://matplotlib.org
  5. Working with Excel Files in Python: http://www.python-excel.org
  6. wxPython library: http://www.wxpython.org
  7. GnuPG: https://gnupg.org
  8. "Encrypting Email with GnuPG" by Peter Kreussel, Ubuntu User , Issue 19, 2013: http://www.ubuntu-user.com/Magazine/Archive/2013/19/Encrypting-email-with-GnuPG/%28language%29/eng-GB