protostar.space

Design handoff best practices

Krisztina Szerovay — Fri, 27 Mar 2020 22:55:04 +0000

In an ideal case, there is no big design up front, no big hand-off, so ideally designers and developers work on the same parts of a digital product or service. It allows you to go through each problematic detail together, come up with detailed UI animations more easily, it also reduces waste during the product development process. It also means that the definition of done for designers is not about sending the final files to the developers, instead, it’s done when a certain release happens (of course “done” means different things for different teams and projects).

However, in many cases, the product development process chosen by your organization won’t let you do this. Also, your organization might hire a third party developer vendor or partner.

Summary sketch of the Design handoff best practices

So let’s assume that there is some kind of hand-off: after creating the UI design, the designer hands it off to the developers. How can he specify the little details?

A checklist of some best practices

Here are some best practices:

Naming the files: use relevant, descriptive, consistent naming (not final_really_final – btw. you should apply a version control system)
Provide SVGs: icons, logo, favicon, custom graphics, illustrations – all these should be in SVG format
Imagery: this is the aspect of art direction (read more about it in my article), it is about preparing images to be displayed on different viewport sizes
Typography: provide all fonts, make sure that you have a proper licence that fits the project’s purposes
Copy and UX copy should also be provided (I believe in content first design: please never design with lorem ipsum); most probably you won’t be able to provide all the texts as part of the screens (e.g. you’ll have error and confirmation messages etc.), these should be provided in a table. There might also be translations (this it he aspect of internationalization and localization)
Layout
- Clearly and consistently specify all the spacing values
- Specify and apply a consistent grid system
- Determine breakpoints, provide all the different layouts for each sizes
- Specify how things should be aligned (see my card-based layout example in this article)
Don’t forget the specifications of the UI Animations included (e.g. object’s position in the beginning and at the end, duration of the transition, easing, transformation type etc.) – tools like Lottie can make this easier
If you use a certain design system, clearly specify what elements you’ve included, what should be used and where
Specify hexa codes for colors, gradients
Provide all the assets in a consistent file structure, everything should be easily findable
All the different states of components should be provided (e.g. in case of a button: pressed, hover, focus, disable, default; checkbox on and off)
Pay attention to platform specific guidelines and principles, and communicate these to the developers

Final advice: use a checklist, so that there won’t be any missing parts

How to specify all these

You can use annotated sketches, wireframes, prototypes, and create design handoff documents (UI documentation).

Next to these manual ways, there is an automated way. You can use design hand-off and inspection tools (e.g. Zeplin, Invision Inspect or Avocode) – in this case you need to use a consistent layer structure in your design tool of choice. These tools let developers inspect the designs, get measurements, all the assets, and these usually include collaborative features as well, e.g. you and the developers can add annotations, comments.

So to sum up, the most important thing is that you need to specify all these things so that developers can effectively implement your designs. I’d like to highlight that a heavy documentation and a highly detailed specification can generate a lot of waste in the process, but as I mentioned, in many cases, you can’t avoid creating heavy hand-offs and detailed deliverables.

The post Design handoff best practices appeared first on protostar.space.

Designing for Different Screen Sizes & Devices – Part 2

Krisztina Szerovay — Fri, 27 Mar 2020 22:15:00 +0000

Now that you understand what the design considerations of designing for the different screen size and devices are, let’s take a look at some other important aspects. (Here you can find the Part 1)

Summary sketch of Designing for Different Screen Sizes & Devices Part 2

Responsivity & Images

First let’s talk about images.

One safe solution for including images in your layouts is using a fix aspect ratio, e.g. for many years Instagram only allowed users to upload square images with a 1:1 aspect ratio (and your photo grid screen still contains square images, it crops anything that is not a square).

Now, what happens if you are not working with a 1:1 aspect ratio? In case of a fluid container, the aspect ratio might change depending on the changes of the viewport size. And if you have a fix height, and a dynamically changing, fluid width, you might end up with a distorted image.

In the following example, both of these images have a fix and identical height.

Upon changing the viewport size, the image above gets distorted. On the contrary, the image below doesn’t get distorted, the biggest possible portion of the image is displayed without any distortion:

The key takeaway here is that you, as a designer should specify what should happen with the images upon viewport size changes, how these elements are displayed on different viewport sizes.

And how can we specify what part of the image will be displayed and what part will be cropped out?

Art direction

Usually it’s handled by dynamic cropping – but what happens if the most important part of the image is off-center, e.g. if the rule of thirds was applied, and the subject of the image is placed at the intersection points nearer to the left or right side of the picture? In these cases, dynamic cropping might crop out the main subject. A possible solution is that you can identify focal points, and there are tools that include for instance face detection. So you can either address this issue by manually setting focal points, or there are solutions for automatic art direction (this is the specific term for this activity).

Btw. it is not only about keeping the important part. Let’s say you have a landscape photo with a group of people in the middle (e.g. a family or a group of friends). On larger viewport sizes, it might be easy to tell who is in the picture, you can see their faces very well. However, on smaller viewport sizes it might become hard to recognize someone on such pictures (at least without zooming in).

So another solution is that you can define different pictures for the different viewport sizes (e.g. in Bootstrap’s system, one for the small, one for the medium and so on), and these different images show different parts of the original picture, so for instance you can sort of zoom in (crop out the unnecessary parts), and only show the faces on a smaller viewport size. That way, you’ll be able to assign a good quality image to each size – e.g. your smaller picture’ll be good enough for a smaller screen, and it’ll positively impact the performance, it’ll load faster, so it’s great from a UX perspective. And at the same time, on a large screen, a big enough picture’ll be displayed.

Let’s take a look at an art direction example, as you’ll see on the landscape version of the photo, the main subject is off-center:

On smaller viewport sizes, the image is displayed in portrait mode – the person on the image is the main subject, he is the relevant part:

So until this point I talked about

Automatically and manually defined focal points
And using different parts, zoom-levels or orientations of the same image.

There is one more option: you can attach completely different images for the different viewport sizes (so not only different parts of the same image). For instance on a national park website you can include a detailed drone photo of the park for bigger viewport sizes, and a picture of a flower for smaller sizes.

Screen resolution and pixel density

Sn aspect I’d like to mention in connection with pixels is that there are high pixel density screens.

A screen resolution refers to the number of pixels: how many pixels are displayed horizontally and vertically. Now, pixel density [Pixels per inch (PPI)] tells you how many pixels are displayed within a given area of the screen.

For example this is one pixel on the left, and there are 4 pixels ond the right, and the size of the area is the same:

So what this means from a UX perspective is that in case of a cheaper, low-end device, it’s not a good idea to include a high resolution image, since it can’t be displayed in its original form, it must be downscaled first (and the bigger file size might also cause performance problems).

At the same time you should keep in mind that you need high resolution images to provide sharp, good-looking pictures on high-end devices. So this is another aspect you need to consider.

Raster and bitmap images

Until this point I mainly talked about raster or bitmap images. You surely know the difference between raster and vector images, to put it really simply, vector images are based on mathematical calculations, so these are infinitely scalable. On the other hand, raster images are made of pixels, and not scalable infinitely. Also, a vector file has a smaller file size, so it’s better for performance. A commonly used format is SVG, that is stands for Scalable Vector Graphics. So in case of a logo, icons, illustrations you should apply SVGs, these are great for all screen resolutions and pixel densities.

And of course you can define different SVG files for different screen sizes, e.g. a detailed infographic is not always the best solution for a smaller viewport size.

Textual content and videos

And what about textual content? It is a good practice to specify a maximum length for each textual element. Other aspects include typographic choices, like fonts, font sizes, spacing. And in many cases, the textual content itself is also different for different viewport sizes, for example you might want to display only some short paragraphs on a mobile device, while on larger viewport sizes you might want to include longer stories.

You should also discuss with developers how to add videos – e.g. a background video. And another aspect is that you should specify what should be the printable parts of a layout (so for instance you don’t want to include the menu bar or ads displayed next to the actual content).

Skeletons, lazy loading and infinite scrolling

Finally, I’d like to mention three additional techniques.

Applying skeleton contents or skeleton screens means that before the actual content, a low-fidelity mock-up or placeholder is displayed. It makes the perceived loading time shorter, so it increases the perceived performance.

The second technique is the so-called lazy loading. In this case the content is gradually loaded, and its loading is initiated by scrolling down. So the content outside of the viewport is only loaded if the user scrolls to them.

Infinite scrolling is a similar technique. It’s used by for example Facebook, Twitter or Instagram. There is no pagination, the content is continuously loaded as the user scrolls down.

When you create layouts for websites or for applications, you should keep these techniques in mind, consider using them. Of course there are many more ways and techniques for performance optimization, you should discuss the possibilities with the developers.

Next to these already mentioned aspects, here are some more things to keep in mind.

Some more aspects

Different devices or platforms might involve…

Using different design guidelines (e.g. Material Design, Human Interface Guidelines)
Designing for different ways of user input, e.g. on handheld devices, you’ll design with the different gestures in mind; while on desktop, users apply cursors (e.g. you can use hover states)
Using different UI components, UI design patterns, animations

The most important thing is that you should discuss what the available options are with the developers. For instance you should discuss what libraries, reusable components, existing solutions should be used. Creating custom solutions, for example custom UI libraries requires more development effort. Of course most of the existing libraries let you customize things to a certain extent, you should talk about these with the developers.

Let me know if you have any more tips & tricks for designing for different viewport sizes & devices!

The post Designing for Different Screen Sizes & Devices – Part 2 appeared first on protostar.space.

Designing for Different Screen Sizes & Devices – Part 1

Krisztina Szerovay — Fri, 27 Mar 2020 22:03:02 +0000

Several years ago “responsive design” was a buzzword, nowadays it is a must, a norm. A huge percentage of traffic comes from mobile devices, and Google also takes into account mobile-friendliness in the process of ranking your website. The term refers to responding to the user’s context and behavior by taking into account the different viewports sizes and devices. In this article, I explain what responsivity means, and I also show you some important design considerations.

Summary sketch of Designing for Different Screen Sizes & Devices Part 1

The basics

So let’s start with some terms:

Fixed layout means that it does not change based on the screen size or viewport size. Except for specific cases (e.g. a special medical device or a kiosk that requires designing only for that exact screen size, or designing for an Apple watch – currently there are only a small amount of screen sizes), so except for these cases, it’s better to design a layout that dynamically changes based on the screen size.

A layout is fluid when the size of the elements inside the layout is defined by percentages, so if it proportionally changes based on the viewport size. As you can image, using only this as a solution results in suboptimal experiences either on small or on large viewport sizes.

Fluid content on a small viewport (320px wide)

Fluid content on a larger viewport (1366px wide)

The adaptive approach means that you design a series of fixed layouts, and the one nearest in size to the given viewport size is being displayed. This is good for performance, and you can design for each and every size intentionally.

Responsivity

Responsivity means that your digital product or service is displayed in a way that it responds to the properties (e.g. viewport size, orientation) of the device it’s being viewed on.

It’s a combination of the fluid and adaptive approach: you predefine breakpoints, it let’s you design different layouts for the size bigger and smaller than the breakpoint – this is the adaptive nature, and at the same time it responds by stretching or shrinking – this is the fluidity.

As an example, the Bootstrap component library contains a responsive grid system, and defines 4 breakpoints for 5 different viewport size intervals.

Bootstrap 4 breakpoints

There is extra small for a viewport size smaller than 576 pixels width, and there is an extra large category with viewport sizes above 1200 pixels, and 3 more sizes in between.

You might notice that I used two terms for the sizes: viewport size and screen size. The difference is the following: the viewport size might be smaller than the screen size, since for instance you can change the size of a window. So the viewport is the visible area of a website or an application from the user’s perspective.

When you for instance split your screen into two windows, the website or application’ll be displayed on half the screen. Others might use these terms differently, the important thing is that you need to think about cases when the user might want to resize the window, so an application is not always displayed on the full screen size. On smaller mobile devices, the screen size and the viewport size is usually the same, however, on larger tablets there might be split-screen use cases.

There are basically two types of containers: fixed width and fluid width containers.

A container contains elements of a website or an application (in our following examples, the cards). It defines the size of the horizontal area where these elements will be displayed. There might be multiple containers, and you can combine the container types as well, so for example you can place images inside fluid width containers, and text inside a fixed width container.

Let’s take a look at an example based on Bootstrap’s breakpoints.

Example: 576 px viewport size

Example: fixed and fluid width containers

Let’s say we have an 576 px viewport size. So it is small size in this system. The second row contains the width of a fixed width container, it is 540 px. What does this exactly mean? It means that from 576 px to 767 px width (so in case of a small viewport size) the size of the container is always 540 px. So for instance if you have 3-columns or a card based layout containing 3 cards in a row, the width of one column or card is always 180 px. The only thing that can change is the spacing on the left and right side of the container! Here is a demonstration for you.

Currently the viewport size is 800 pixels:

Now I resize this window, it’s new width is around 900 px. In Bootstrap’s system, both of these are medium-size, I haven’t reached any breakpoints yet. As you can see, I have 2 cards in a row, and the width of these cards haven’t changed, since I apply a fixed width container:

Now let’s go all the way up to around 1000 px. There was a breakpoint at 992 px width, 1000 pix width means that now we’re looking at a large viewport size. As you can see, the layout has changed due to passing the breakpoint, now there are 3 cards in a row:

If I increase the width to 1100 px, you can see that I use a fixed width container, since the width of the cards has remained the same (the only thing that has increased is the spacing on both sides of our container):

You might ask why there is no fix width container for extra small viewport sizes. The reason is that this viewport size is so small that it makes more sense or it’s more rational to use all the width available.

In case of a fluid container, the container’ll use the entire width of your viewport, this is why it’s also called full-width container. In this case the width of the cards changes as I increase or decrease the width of the viewport size, since the width is specified by using percentages.

Here are examples showing this behavior. Below 768px (this is the extra small and small size in Bootstrap’s grid system) I have only one column:

On medium sized screens there are two columns:

On large screens I defined three columns:

And on the extra large screens there are six columns:

Since I use fluid containers for this example, cards always take up the entire available viewport size. This is why the third row (fluid width container) of Bootstrap’s breakpoints table contains that in case of a fluid width container, the width of the container is the same as the width of the viewport.

Here is a good practice for designing a responsive layout: you should always check how the application or website looks like on the sort of two sides of a breakpoint, so for instance you should test it with 575 and 576 px width (the extra small and the small). Currently the smallest mobile device’s screen size is 320 px, you should check that out, too. And it’s also worth taking a look at the design on a larger viewport that’s width is more than 1920 px.

I’d like to mention that using Bootstarp’s 4-breakpoint grid system is only one solution, you can define your own system. And of course you can design a different layout for each possible viewport size, e.g. one for 600, one for 601, one for 602 pixel width and so on, but it wouldn’t be a rational thing to do. 🙂

When you design for instance a card-based layout, it’s also important that you specify for the developer what elements should always be aligned horizontally. As you will see in the following example, at 1200 px width a long card title will break the layout, the texts and buttons will no longer be aligned horizontally. So here everything looks fine:

Large viewport size, 1199px wide

When I change the viewport size to 1200 px, and the layout breaks:

Extra large viewport size, 1200px wide

After fixing this issue, I increase the viewport size to 1200 px again, and it still looks good:

Extra large viewport size, 1200px wide

And btw. this is why it’s important to specify a maximum length for each textual element displayed on a layout. The other interesting aspect here is creating translations – what happens with the layout in case you translate textual content to a language that contains longer words than English? Finally, one more thing I would like to mention here is that this is why you should always design with real content. If possible, avoid using lorem ipsum.

You might ask what happens if you have an odd number of elements, e.g. a consultancy provides 5 different types of services. How can you display 5 items on a smaller viewport size when you can no longer display them next to each other? It’s entirely possible that there will be rows that doesn’t contain the maximum number of elements, e.g. you’ll have 2 rows, one with 3, one with two elements. It’s a design decision, the important thing is that your developer team member should know about it.

In the 2nd part of this article series, I explain how you can include images in your design and some more aspects of designing for different screen sizes and devices.

Written & illustrated by Krisztina Szerovay / www.sketchingforux.com

The post Designing for Different Screen Sizes & Devices – Part 1 appeared first on protostar.space.

Presenting our new open-source project, AICells at the Machine Learning Prague 2020 Conference!

Gergely Szerovay — Sat, 29 Feb 2020 09:36:22 +0000

Machine Learning Prague 2020 is a highly practical and insightful conference about ML, AI and Deep Learning applications that’ll take place in Prague at the end of March (March 20-22, 2020).

We at AI Energizer are media partners of the Machine Learning Prague Conference, thanks to this cooperation they kindly offered a free ticket to our followers! If you want to win a free ticket, follow the steps on the end of this LinkedIn article before March 4.

AICells, the MS Excel tool that helps you use Python

We have been working on several Python-based data science projects for both SMEs and enterprise-level businesses. In most cases, we deliver our machine learning and statistical tools applying Jupyter Notebooks and Python code. What we found is that delivering our results in a meaningful way is challenging due to the lack of skills needed inside the companies: the in-house team is often not capable of using the delivered solution to its full potential.

We have been developing an open-source tool, called AICells, that eliminates the steep learning curve imposed by the complexity of Jupyter Notebooks and Python code. This tools aims to enable Python based machine learning tools operated by using Excel, so someone capable of using MS Excel’s basic functionality will be able to fully exploit the benefits of applying machine learning-based tools!

Two experts of the AI Energizer team, Gergely Szerovay and Laszlo Siller will present the poster of AICells at the Machine Learning Prague 2020 Conference.

4 talks we are really excited about

45 world experts will bring outperforming solutions that you can learn through practical talks and hands-on workshops. You can get an overview of the complete program on the Machine Learning Prague 2020 conference website.

Here are some of the speakers & talks that we at AI Energizer are really excited about:

François Chollet (Software Engineer, Google): He is the author of Keras, one of the leading python framework for neural networks. He will give a presentation about how we could define and what are the measures of intelligence.

Filip Dousek (Senior Director, Augmented Analytics at Workday): In his talk “Building Augmented Analytics for 50% of Fortune 100” he will talk about the concepts behind their ML-driven augmented analytics solutions and why are these called the next generation of BI & analytics.

Ashish Kapoor (Sr. Principal Researcher, Microsoft): In his talk “Building Safety Mechanisms in Autonomous Systems”, he will explore a framework that aims to preserve safety invariants despite the uncertainties in the environment arising due to incomplete information. Ashish will describe various methods to reason about safety plans and control strategies despite perceiving the world through noisy sensors and ML systems.

Tomas Mikolov (Research Scientist at Facebook AI Research): He is known as the one of the inventors of the famous word2vec method of Word embedding. In his talk, he will describe a project where they attempted to define an ML system which can evolve for indefinitely long, possibly reach arbitrary complexity, and use no supervision based on cellular automaton.

How to win a free ticket

Thanks to being a media partner, we can give away one FREE Conference ticket to attend the conference and we also provide a voucher with 20% off specially created for us to purchase a conference ticket withthis coupon code: aienergizer20 (must be inserted in lowercase).

To participate in the contest for the free ticket, follow the steps on the end of this LinkedIn article before March 4.

Good Luck and see you soon in Prague!Conference website: Machine Learning Prague 2020

The post Presenting our new open-source project, AICells at the Machine Learning Prague 2020 Conference! appeared first on protostar.space.

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far)

Gergely Szerovay — Wed, 11 Apr 2018 18:24:45 +0000

My intention with publishing this collection

Last year I only used Medium for consuming content, and I checked out a ton of Python-related articles. Recently I’ve started to use the communityfeatures of the platform, for instance following fellow developers. I also developed the practice of clapping and highlighting the most interesting parts of their articles. My goal is to be an active member of the developer community gathered on Medium.

I also realized that I would like to give back to the community after reading so many great resources. This was one of my main motivations for writing my first article “Why you need Python environments and how to manage them with Conda”.

In this article, I’d like to share with you the articles I found most interesting and insightful (inspiring) last year and this year (so far). My other goal was to create a comprehensive list of the most valuable pieces for my Python students.

How to navigate this article

I bookmarked so many great resources that it was not easy to select the best ones. So I divided the article into 10 categories — this, by the way, resonates well with the versatile and multipurpose nature of Python.

The categories are:

1. General Python programming

2. Python performance optimization

3. Python development environments and DevOps

4. Machine learning

5. Image and video processing

6. Chatbots and Natural language processing (NLP)

7. Blockchain

8. Web and backend development

9. Web scraping

10. Data visualization

Just one more thing before you dive in: how should you use this article? It doesn’t need to be read all at once. Bookmark it, and use it as a starting point or a reference point. With the category list above, you can navigate to the sections that you are interested in most.

And please let me know in a comment if you feel that I left out an awesome resource so I can update my collection. Thanks in advance!

1. General Python programming

1.1 Learning Python: From Zero to Hero, by TK

A comprehensive introduction to Python, this is a must-read if you are new to this world. It explains the basics: variables, control flow, looping and iteration, collections, arrays, structures and dictionaries. It covers the foundations of object oriented programming, too. So if you’ve just started your Python developer journey, this is a great starting point.

Learning Python: From Zero to Hero

1.2 Understanding the underscore( _ ) of Python, by mingrammer

Did you know that the underscore (_) in Python has special meanings? It has five different use cases. Check them out in this article!

Understanding the underscore( _ ) of Python

1.3 A brief tour of Python 3.7 data classes, by Anthony Shaw

Data classes is a brand new feature of Python 3.7. It reduces the boilerplate in case of creating a class with typed data fields. The article provides an easy-to-follow explanation and several examples of this feature.

A brief tour of Python 3.7 data classes

1.4 How to Use Static Type Checking in Python 3.6, by Adam Geitgey

As of Python 3.6, there is a syntax for declaring types. However, you need an external tool like mypy or PyCharm to enforce type checking. This article is a good starting point to learn how to implement static types in your code.

How to Use Static Type Checking in Python 3.6

1.5 How — and why — you should use Python Generators, by Radu Raicea

This tutorial showcases examples of an iterator class and different types of generator functions.

“Generator functions allow you to declare a function that behaves like an iterator. They allow programmers to make an iterator in a fast, easy, and clean way.”
“An iterator is an object that can be iterated (looped) upon. It is used to abstract a container of data to make it behave like an iterable object. You probably already use a few iterable objects every day: strings, lists, and dictionaries to name a few.”

How — and why — you should use Python Generators

1.6 Intro to Threads and Processes in Python, by Brendan Fortuner

This and the following article (1.7) are about threading and parallel processing in Python. This piece is an introduction to Python’s parallel processing features with processes and threads, and the second article covers more advanced stuff.

Intro to Threads and Processes in Python

1.7 Let’s Synchronize Threads in Python, by Saurabh Chaturvedi

A great overview of multi threading and its most challenging aspect: thread synchronization and communication.

Let’s Synchronize Threads in Python

1.8 How to write a production-level code in Data Science? by Venkatesh Pappakrishnan, Ph.D.

This article contains suggestions on creating production ready code for data science purposes. It helps you organize and optimize your code, covers the topics of logging, instrumentation and testing, describes the basics of version controlling, and gives directives on code readability. Great advice and best practices!

How to write a production-level code in Data Science?

1.9 How to rewrite your SQL queries in Pandas, and more, by Irina Truong

If you are new to Pandas and DataFrames, and you have a good understanding of SQL, I highly recommend reading this article. It contains a valuable phrasebook with examples. It helps you to translate your SQL query ideas to Pandas’s syntax and to learn this new syntax.

How to rewrite your SQL queries in Pandas, and more

2. Python performance optimization

2.1 Yes, Python is Slow, and I Don’t Care, by Nick Humrich

Python developers achieve high productivity, but we’ve all heard the myth before: Python is slow. I find this article important since it explains Python’s performance optimization features.

“Run time is no longer your most expensive resource. A company’s most expensive resource is now its employee’s time.”

Yes, Python is Slow, and I Don’t Care

2.2 A Beginner’s Guide to Optimizing Pandas Code for Speed, by Sofia Heisler

In case of larger volumes of data processed by Pandas, you should use carefully-chosen coding solutions in order to improve the performance.

This and the following article review several methodologies for applying a function to a Pandas DataFrame and compare their running speeds.

A Beginner’s Guide to Optimizing Pandas Code for Speed

2.3 Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba, by Ernest Kim

Some more methodologies for applying a function to a Pandas DataFrame. These methods use parallelization to gain more speed.

Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba

2.4 Memory efficiency of parallel IO operations in Python, by Jakub Wolf

This article compares the memory efficiency of three methods for parallelizing IO-bound operations. The newest method is the asyncio module. It is a part of the standard Python library since Python 3.5. It is a good choice if your goal is to have high performance with low memory footprint.

Memory efficiency of parallel IO operations in Python

2.5 Regex was taking 5 days to run. So I built a tool that did it in 15 minutes, by Vikash Singh

This piece explains how to search and replace keywords in high volumes of data using the Aho-Corasick algorithm and the Trie Data Structure approach. I was amazed by this clever optimization.

Regex was taking 5 days to run. So I built a tool that did it in 15 minutes.

2.6 Dismissing Python Garbage Collection at Instagram, by Instagram Engineering

This article shows an advanced optimization technique for multi-process Python applications.

Dismissing Python Garbage Collection at Instagram

2.7 A million requests per second with Python, by Paweł Piotr Przeradowski

Introductory article to an amazingly fast, new framework for micro-services: Japronto.

A million requests per second with Python

3. Python development environments and DevOps

3.1 Install PyCharm and Anaconda (Windows /Mac/Ubuntu), by Michael Galarnyk

This tutorial is a great starting point for beginners. It summarizes the installation process and also contains a ten-minute-long video, that walks you through each step.

Install PyCharm and Anaconda (Windows /Mac/Ubuntu)

3.2 Docker Tutorial — Getting Started with Python, Redis, and Nginx, by Roman Gaponov

This article explains how you can benefit from Docker in your software development process. Great piece for beginners!

Docker is an open source tool that automates the deployment of the application inside software container.

Docker Tutorial — Getting Started with Python, Redis, and Nginx.

3.3 How to write Dockerfiles for Python Web Apps, by Praveen Durairaj

An excellent guide with sample Dockerfiles — check it out if you are building Python web apps using Docker!

How to Write Dockerfiles for Python Web Apps

3.4 Get Started with PySpark and Jupyter Notebook in 3 Minutes, by Charles Bochet

Apache Spark is a big data processing engine that can be used from Python with the PySpark library. This library is great for creating prototypes on the big data and machine learning field. This guide kickstarts you on the path of using Spark from Python.

Get Started with PySpark and Jupyter Notebook in 3 Minutes

3.5 JupyterLab first impressions, by Brian Ray

Jupyter is such an essential tool for Python programmers. Here is a nice article about its newest version.

JupyterLab first impressions

4. Machine learning

4.1 The Hitchhiker’s Guide to Machine Learning in Python, by Conor Dewey

This article showcases sample codes and explanatory videos for eight machine learning algorithms: Linear Regression, Logistic Regression, Decision Trees, Support Vector Machines, K-Nearest Neighbors, Random Forests, K-Means Clustering, and Principal Components Analysis. One of my favorite articles, a must-read for beginners.

The Hitchhiker’s Guide to Machine Learning in Python

4.2 Learning AI if You Suck at Math series, by Daniel Jeffries

An excellent series of 7 articles (thanks Daniel Jeffries for your effort!). The title is quite self-evident: not being great at math does not mean that you can’t understand how artificial intelligence works! Give it a go — it’s really worth it!

You suck at math.

That’s all right! I share your dirty little secret and I have some books and websites that will really help you get rolling fast.

Learning AI if You Suck at Math — Part 1
This article guides you through the essential books to read if you were never a math fan but you’re learning it as adult.

Learning AI if You Suck at Math — Part 2 — Practical Projects
This article guides you through getting started with your first projects.

Learning AI if You Suck at Math — Part 3 — Building an AI Dream Machine or Budget Friendly Special
This article guides you through getting a powerful deep learning machine setup and installed with all the latest and greatest frameworks.

Learning AI if You Suck at Math — Part 4 — Tensors Illustrated (with Cats!)
This one answers the ancient mystery: What the hell is a tensor?

Learning AI if You Suck at Math — Part 5— Deep Learning and Convolutional Neural Nets in Plain…
Here we create our first Python program and explore the inner workings of neural networks!

Learning AI If You Suck at Math — Part 6 — Math Notation Made Easy!
Still struggling to understand those funny little symbols? Let’s change that now!

Learning AI if You Suck at Math — Part 7 — The Magic of Natural Language Processing
Understand how Google and Siri understand what you’re mumbling.

4.3 Machine Learning Zero-to-Hero: Everything you need in order to compete on Kaggle for the first time, step-by-step! by Oren Dar

Participating in Kaggle competitions is a great way to improve your machine learning skills. This guide shows you how to solve one of Kaggle’s machine learning problems and submit your results to the competition.

Machine Learning Zero-to-Hero: Everything you need in order to compete on Kaggle for the first…

4.4 Higher-Level APIs in TensorFlow, by Peter Roelants

TensorFlow 1.3 introduces three new higher-level frameworks: Estimator, Experiment, and Dataset. This post explains them and shows an example usage.

Higher-Level APIs in TensorFlow

4.5 Deploy TensorFlow models, by Francesco Zuppichini

This tutorial shows you how to make your TensorFlow models accessible from the web with Flask and TensorFlow serving.

Okay, you have a model and you want to make it accessible from the web. There are several ways you can do that, but the faster and the most robust is TensorFlow serving.

Deploy TensorFlow models

4.6 Simple and Multiple Linear Regression in Python, by Adi Bronshtein

Linear regression is an approach for modelling the linear relationship between a dependent variable and one or more independent variables. It is one of the most widely used algorithms of machine learning.

This article explains the mathematical basics of linear regression, then provides examples using Python’s Statsmodels and Scikit-Learn libraries. If you consider yourself a beginner, I highly recommend reading this article.

Simple and Multiple Linear Regression in Python

4.7 Reducing Dimensionality from Dimensionality Reduction Techniques, by Elior Cohen

In this post, Elior demystifies three dimensionality reduction techniques: PCA, t-SNE, and Auto Encoders.

“The need to reduce dimensionality is often associated with visualizations (reducing to 2–3 dimensions so we can plot it) but that is not always the case. Sometimes we might value performance over precision so we could reduce 1,000 dimensional data to 10 dimensions so we can manipulate it faster (eg. calculate distances).”

Reducing Dimensionality from Dimensionality Reduction Techniques

4.8 Random Forest in Python, by William Koehrsen

This step by step guide shows how to solve a machine learning problem with random forests. It is very detailed, explains how to prepare and clean the data, then creates and improves a model, finally visualizes the results. Awesome stuff.

Random Forest in Python

4.9 Building A Logistic Regression in Python, Step by Step, by Susan Li

There are many concepts in the field of machine learning to master. One of these islogistic regression. This article provides a great starting point, and I’ve recommended it several times to my students.

“Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).”

Building A Logistic Regression in Python, Step by Step

4.10 Understanding Feature Engineering, by Dipanjan Sarkar

This series introduces you to feature engineering, one of the most important tasks of a data scientist. What is the essence of feature engineering? Here is a great quote:

“Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is basically feature engineering.”

— Prof. Andrew Ng.

Understanding Feature Engineering (Part 1) — Continuous Numeric Data
Strategies for working with continuous, numerical data

Understanding Feature Engineering (Part 2) — Categorical Data
Strategies for working with discrete, categorical data

Understanding Feature Engineering (Part 3) — Traditional Methods for Text Data
Traditional strategies for taming unstructured, textual data

Understanding Feature Engineering (Part 4) — Deep Learning Methods for Text Data
Newer, advanced strategies for taming unstructured, textual data

4.11 Ten Machine Learning Algorithms You Should Know to Become a Data Scientist, by Shashank Gupta

This article showcases ten useful machine learning algorithms, explains the basic concepts of them, and suggests Python libraries and introductory tutorials to start with.

Ten Machine Learning Algorithms You Should Know to Become a Data Scientist

4.12 Open Machine Learning Course. Topic 1. Exploratory Data Analysis with Pandas, by Yury Kashnitskiy

I believe in knowledge sharing, and The Machine Learning Course by OpenDataScienceis an awesome resource. If you are interested in the series, you can find the topics at the beginning of this article.

Open Machine Learning Course. Topic 1. Exploratory Data Analysis with Pandas

4.13 Time Series Analysis in Python: An Introduction, by William Koehrsen

This article is a good introduction to time series analysis. It contains an example evaluation of the Tesla and GM stock prices, and builds a forecast model with the Facebook Prophet package.

Time Series Analysis in Python: An Introduction

5. Image and video processing

5.1 How I implemented iPhone X’s FaceID using Deep Learning in Python, by Norman Di Palo

The algorithm behind Apple’s FaceID is proprietary. This article analyzes how this feature might work and shows a proof-of-concept implementation of FaceID using siamese convolutional networks.

How I implemented iPhone X’s FaceID using Deep Learning in Python.

5.2 Tracking the Millennium Falcon with TensorFlow, by Nick Bourdakos

By following this tutorial, you’ll be able to learn how to build you own TensorFlow-based custom object detector building on the COCO dataset.

Tracking the Millennium Falcon with TensorFlow

5.3 Using Deep Learning to improve FIFA 18 graphics, by Chintan Trivedi

The concept of face-swapping done by deep neural networks has been developed recently. One of the most well-known algorithms is Deepfakes, and its results has even been featured in the mainstream media.

This article shows how it could be used in the game industry, and explains the basics of the algorithm. I believe that the key takeaway is that deep learning can be applied in almost any industry / in any field.

Using Deep Learning to improve FIFA 18 graphics

5.4 Deep Learning with Keras on Google Compute Engine, by Cole Murray

I’m fascinated by the idea of image recognition. This article takes the learning-by-doing approach, since it explains setting up

a Flask-based web application connected to a Keras-supported image recognition and
an image store using Google Cloud Storage

in a step-by-step manner.

Deep Learning with Keras on Google Compute Engine

5.5 How to use transfer learning and fine-tuning in Keras and Tensorflow to build an image recognition system and classify (almost) any object, by Greg Chu

By using a pre-trained network built for a similar task, you can make your convolutional neural network’s training speed faster.

“It’s well known that convolutional networks require significant amounts of data and resources to train. For example, the ImageNet ILSVRC model was trained on 1.2 million images over the period of 2–3 weeks across multiple GPUs.”

How to use transfer learning and fine-tuning in Keras and Tensorflow to build an image recognition…

6. Chatbots and Natural language processing (NLP)

6.1 Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK, by Javed Shaikh

Text classification is one of the basic concepts of natural language processing. The tutorial follows these steps:

1. Prerequisite and setting up the environment.
2. Loading the data set in jupyter.
3. Extracting features from text files.
4. Running ML algorithms.
5. Grid Search for parameter tuning.
6. Useful tips and a touch of NLTK.

Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK.

6.2 Text Classification using Neural Networks, by gk_

This article helps you understand how text classification works and demonstrates it using a two layer neural network.

Text Classification using Neural Networks

6.3 Contextual Chatbots with Tensorflow, by gk_

This post shows how to transform conversational intent definitions to a TensorFlow model and how to build a chatbot framework around it.

Contextual Chatbots with Tensorflow

6.4 How to Create and Deploy a Telegram Bot? by Roman Gaponov

By following this tutorial, you’ll be able to build a simple Telegram-based chatbot and deploy it on Heroku.

How to Create and Deploy a Telegram Bot?

6.5 I built a serverless Telegram bot over the weekend. Here’s what I learned, by Moses Soh

Another Telegram-based chatbot example, it shows how to deploy the bot on AWS Lambda with the help of a tool called Zappa. Building something over a weekend sounds fun, doesn’t it? 🙂

I built a serverless Telegram bot over the weekend. Here’s what I learned.

7. Blockchain

7.1 Let’s Build the Tiniest Blockchain, by Gerald Nash ⚡️

This series and the following article(6.2) help you understand how a blockchain works by building one! I always prefer learning by doing, and that’s why I like these articles so much.

Let’s Build the Tiniest Blockchain

Second part of the article:

Let’s Make the Tiniest Blockchain Bigger

7.2 Learn Blockchains by Building One, by Daniel van Flymen

Learn Blockchains by Building One

8. Web and backend development

8.1 How to use Python and Flask to build a web app — an in-depth tutorial, by Abhinav Suri

A great tutorial for creating a full-stack Python application based on Flask, a Python microframework.

How to use Python and Flask to build a web app — an in-depth tutorial

8.2 Building Microservices with Python, by Sergio Sola

This piece is divided into 3 parts. The author is a Software Engineer building microservices for a personal project. Sidenote: having personal projects is a great way of developing your skills! This is what the tutorial is about:

1. Building the skeleton of the microservice:

Building Microservices with Python , Part I

2. Creating the microservice’s infrastructure in Docker:

Building Microservices with Python, Part 2

3. Last, but not least, building the microservice’s business logic:

Building Microservices with Python, Part 3

8.3 ElasticSearch with Django the easy way, by Adam Wattis

ElasticSearch is a great tool for implementing free text search. By following this tutorial, you’ll set up an ElasticSearch server, load data into it, and connect it with a Django-based application.

ElasticSearch with Django the easy way

9. Web scraping

9.1 How to scrape websites with Python and BeautifulSoup, by Justin Yek

BeautifoulSoup is a helpful utility for extracting data from a HTML page. This article shows how it works.

How to scrape websites with Python and BeautifulSoup

9.2 Using Scrapy to Build your Own Dataset, by Michael Galarnyk

By utilizing Scrapy, you can download websites and extract data from the HTML pages using CSS selectors. It is a fully-fledged solution for web scraping. This article demonstrates how it works by showing an example of scraping data from a real website.

Using Scrapy to Build your Own Dataset

9.3 30-minute Python Web Scraper, by Angelos Chalaris

This post explains how to use Selenium webdriver with Geckodriver to open a browser window and control it from Python.

30-minute Python Web Scraper

9.4 How I used Python to find interesting people to follow on Medium, by Radu Raicea

This article provides a great example for using APIs from Python. I believe that the author addresses an important issue here: it is hard to select the most relevant and useful content from the overwhelming information tsunami — this is, by the way, one of the reasons why I created this article collection.

Medium has a large amount of content, a large number of users, and an almost overwhelming number of posts. When you try to find interesting users to interact with, you’re flooded with visual noise.

How I used Python to find interesting people to follow on Medium

10. Data visualization

10.1 – 5 Quick and Easy Data Visualizations in Python with Code, by George Seif

This article contains a great chart that helps you select the proper data visualization technique for a given situation. Then it takes a closer look at six data visualization types and shows examples of using Python’s Matplotlib for creating these assets: two kinds of scatter plots, line plots, histograms, bar plots and box plots.

5 Quick and Easy Data Visualizations in Python with Code

10.2 Data Visualization with Bokeh in Python, by William Koehrsen

The series answers the following question: “How can we add more interactivity to our visualizations?” This tutorial walks you through fully-interactive examples using Python, the Bokeh interactive visualization library, and publicly available datasets.

Data Visualization with Bokeh in Python, Part I: Getting Started

Data Visualization with Bokeh in Python, Part II: Interactions

Data Visualization with Bokeh in Python, Part III: Making a Complete Dashboard

So that is my collection. I’m really grateful for these excellent resources, and many of them have already helped my Python students. I’ve also learned a lot of new tricks along the way.

Clap 👏 — please show if you like this guide, so others can find it more easily!
Respond 💬 — please let me know in the response section if you have any suggestions or questions!

Thanks for reading! 🙏

And thanks to my wife, Krisztina Szerovay, for creating the cover!

The post The best of Python: a collection of my favorite articles from 2017 and 2018 (so far) appeared first on protostar.space.

Why you need Python environments and how to manage them with Conda

Gergely Szerovay — Wed, 21 Feb 2018 10:16:21 +0000

This is how a perfect Python environment looks like 😉

I have over two decades of professional experience as a developer, I know a wide variety of frameworks and programming languages, and one of my favorites is Python. I’ve been teaching it for quite some time now, and according to my experience, establishing Python environments is a challenging topic.

Thus, my main motivation for writing this article was to help current and potential Python users to have a better understanding of how to manage such environments.

If you’ve opened this article, chances are that you already know what Python is, why it is a great tool, and you even have a Python installed on your computer.

So why exactly do you need Python environments? You might ask: shouldn’t I just install the latest Python version?

Win a free ticket for the Machine Learning Prague Conference (20-22. March), the biggest European conference on machine learning, ai, and deep learning applications. To participate in the contest for the free ticket, follow the steps on the end of this LinkedIn article before March 4.

Why you need multiple Python environments

When you start learning Python,it is a good starting point to install the newest Python version with the latest versions of the packages you need or want to play around with. Then, most likely, you immerse yourself in this world, and download Python applications from GitHub, Kaggle or other sources. These applications may need other versions of Python/packages than the ones you have been currently using.

In this case, you need to set up different so-called environments.

Aside from this situation, there are more use cases when having additional environments might come in handy:

You have an application (developed by yourself or by someone else) thatonce worked beautifully. But now you’ve tried to run it, and it is not working. Perhaps one of the packages is no longer compatible with the other parts of your program (due to the so-called breaking changes). A possible solution is to set up a new environment for you application, that contains the Python version and the packages that are completely compatible with your application.
You are collaborating with someone else, and you want to make sure that your application is working on your team member’s computer, and vice versa, so you can also set up an environment for your co-worker’s application(s).
You are delivering an application to your client, and again, you want to make sure that it is working smoothly on your client’s computer.

An environment consists of a certain Python version and some packages. Consequently, if you want to develop or use applications with different Python or package version requirements, you need to set up different environments.

Now that we’ve discussed why environments are useful, let’s dive in and talk about some of the most important aspects of managing them.

Package and environment managers

The two most popular tools for setting up environments are:

PIP (a Python package manager; funnily enough, it stands for “Pip Installs Packages”) with virtualenv (a tool for creating isolated environments)
Conda (a package and environment manager)

In this article, I cover how to use Conda. I prefer it because:

Clear Structure: It is easy to understand its directory structure
Transparent File Management: It doesn’t install files outside its directory
Flexibility: It contains a lot of packages (PIP packages are also installable into Conda environments)
Multipurpose: It is not only for managing Python environments and packages – you can also use it for R (a programming language for statistical computing)

At the time of writing this article, I use the 4.3.x versions of Conda, but the new 4.4.x versions are also available.

In the case of Conda 4.4, there have been recent changes affecting Linux/Mac OS X users. They are described in this changelog entry.

How to choose an appropriate Conda download option

Installing your Conda system is a bit more complicated than downloading a nice picture from Unsplash or buying a new ebook. Why is that?

1. Installer

Currently, there are 3 different installers:

Anaconda (free)
Miniconda (free)
Anaconda Enterprise platform (it is a commercial product that allows organizations to apply Python and R in enterprise environments)

Let’s take a closer look at the free tools, Anaconda and Miniconda. Now, what are the main differences between these two?

What are the things they share in common? They both set up on your computer

the Conda (the package & environment management system) and
the so-called “root environment” (more on that a bit later).

As for the main differences, Miniconda requires about 400MB disk space, and it contains only a few basic packages.

The Anaconda installer requires about 3GB disk space, and it installs over 150 scientific packages (for example, packages for statistics and machine learning). It also sets up the Anaconda Navigator, a GUI tool that helps you manage Conda environments and packages.

I prefer Miniconda, since I’ve never used most of the packages that are included in Anaconda by default. Another reason is that applying Miniconda allows for a smoother duplication of the environment (for example, if I want to use it on a different computer as well), since I only install the packages required by my app(s) on both computers.

From now on, I’m going to describe how Miniconda works (in the case of using Anaconda, the process is almost the same).

2–3. Platform (operating system and bit-count)

In addition to these 3 different installers, there are also subtypes based on bit-count: 32- and 64-bit installers. And of course these also have subtypes for the different operating systems: Windows, Linux, and Mac OS X (except that the Mac OS X version is 64-bit only).

In this article, I focus on the Windows version (the Linux and the Mac OS X versions are only slightly different. For instance the path of the installation folders and some command line commands differ).

So 32-bit or 64-bit?

If you have a 64-bit operating system (OS) with 4GB RAM or more, you should install the 64-bit version. Additionally, you might need a 64-bit installer if the packages you are planning to apply require the 64-bit versions of Python. For instance, if you want to use TensorFlow — more precisely, the official so-called binaries — you need a 64-bit OS and Python version.

If you have a 32-bit OS or you are planning to utilize packages that only have 32-bit versions, the 32-bit version is the good option for you.

4. Python version (for the root environment)

If these 3 dimensions aren’t enough (installers, 32/64-bit, and operating systems), there is a 4th one based on the different Python versions(included in the installer — and consequently, in the root environment)!

So let’s talk a bit about the different available Python versions.

Currently, your options are version 2.7 or version 3.x (at the time of writing this article, it’s 3.6) for the Python that is inside the root environment. For the additional environments, you can choose any version — ultimately, this is why you create environments in the first place: to easily switch between the different environments and versions.

So 2.7 or 3.x version Python for my root environment?

Let me help you decide it really quickly:

Since the 3.x is newer, this should be your default choice. (The 2.7 version is a legacy version, it was released in 2010, and there won’t be newer 2.7 major releases for it, only fixes.)

However, if

you have mostly 2.7 code (you made or utilize applications using the 2.7 versions) or
you need to use packages that don’t have Python 3.x versions,

you should install a Python 2.7-based root environment.

You might ask that: why don’t I just create two environments based on these two 2.7 and 3.x versions? I’m glad that you asked. The reason for that is that your root environment is the one that is created during the installationprocess and it’s activated by default.

I’ll explain in one of the following sections how you can activate an environment, but basically it means that the root environment is the more easily accessible one, so carefully selecting your root environment will make your workflow more efficient.

Throughout the installation process, Miniconda will let you change some options set by default (for example you can check/uncheck some checkboxes). When you install Conda for the first time, I recommend that you leave these options intact (except for the path of the installation directory).

Choosing an appropriate installer for Conda

I’d like to mention one more thing here. While you can have multiple environments that contain different versions of Python at the same time on the same computer, you can’t set up 32- and 64-bit environments using the same Conda management system. It is possible to mix them somehow, but it is not that easy, so I’m going to devote a separate article to this topic.

Python environments: root and additional

So now you’ve picked an appropriate installer for yourself, well done! Now let’s take a look at the different types of environments and how they are created.

Miniconda sets up two things for you: Conda and the root environment.

The process looks like this: the installer installs Conda first, which is – as I already mentioned – the package and environment management tool. Then, Conda creates a root environment that contains two things:

a certain version of Python and
some basic packages.

Next to the root environment, you can create as many additional environments as you want. And the whole point is that these additional environments can contain different versions of Pythons and other packages. So it means that, for example, if your precious little application is not working anymore in the newest, state-of-the-art environment you’ve just set up, you can always go “back” and use some another version(s) of some packages (including Python – Python itself is a package, more on that later).

As I already summarized at the beginning of the article, the main use cases of applying an additional environment are these:

You develop applications with different Python or package version requirements
You use applications with different Python or package version requirements
You collaborate with other developers
You create Python applications for clients

Root and additional environments

Before diving into the basics of environment management, let’s take a look at your Conda system’s directory structure.

Directory structure

As I mentioned above, the Conda system is installed into a single directory. In my example this directory is: D:\Miniconda3-64\. It contains the root environment and two important directories (the other directories are irrelevant for now):

\pkgs (it contains the cached packages in compressed and uncompressed formats)
\envs (it contains the environments — except for the root environment — in separate subdirectories)

The most significant executable files and directories inside a Conda environment (placed in the \envs\environmentname directory) are:

\python.exe — the Python executable for command line applications. So for instance, if you are in the directory of the Example App, you can execute it by: python.exe exampleapp.py
\pythonw.exe — the Python executable for GUI applications, or completely UI-less applications
\Scripts — executables that are parts of the installed packages. Upon activation of an environment, this directory is added to the system path, so the executables become available without their full path
\Scripts\activate.exe — activates the environment

And if you’ve installed Jupyter, this is also an important file:

\Scripts\jupyter-notebook.exe— Jupyter notebook launcher (part of the jupyter package). In short, Jupyter Notebook creates so-called notebook documents that contain executable parts (for example Python) and human-readable parts as well. It’d take another article to get into it in more detail.

So now you should have at least one Python environment successfully installed on your computer. But how can you start utilizing it? Let’s take a closer look.

GUI vs. Command line (Terminal)

As I mentioned above, the Anaconda installer also installs a graphical user interface(GUI) tool called Anaconda Navigator. I also pointed out that I prefer using Miniconda, and that does not install a GUI for you, so you need to use text-based interfaces (for example command line tools or the Terminal).

In this article, I focus on the command line tools (Windows). And while I concentrate on the Windows version, these examples can be applied to Linux and Mac OS X as well, only the path of the installation folders and some command line commands differ.

To open the command line, select “Anaconda 32-bit” or “Anaconda 64-bit” (depending on your installation) in the Windows’s Start menu, then choose “Anaconda Prompt”.

I recommend reading through the official Conda cheat sheet (pdf), as it contains the command differences between Windows and Mac OS X/Linux, too.

In the following sections, I’m going to give you some examples of the basic commands, indicating their results as well. Hopefully these will help you better manage your new environment.

Managing environments

Adding a new environment

To create a new environment named, for instance mynewenv (you can name it what ever you like), that includes, let’s say, a Python version 3.4., run:

conda create --name mynewenv python=3.4

You can change an environment’s Python version by using the package management commands I describe in the next section.

Activating and leaving (deactivating) an environment

Inside a new Conda installation, the root environment is activated by default, so you can use it without activation.

In other cases, if you want to use an environment (for instance manage packages, or run Python scripts inside it) you need to first activate it.

Here is astep by step guide of the activation process:

First, open the command line (or the Terminal on Linux/Mac OS X). To activate the mynewenv environment, use the following commands depending on the operating system you have:

on Windows:

activate mynewenv

On Linux or Mac OS X:

source activate mynewenv

The command prompt changes upon the environment’s activation. It becomes, for example, (mynewenv) C:\> or (root) D:\>, so as a result of the activation, it now contains the active environment’s name.

The directories of the active environment’s executable files are added to the system path (this means that you can now access them more easily). You can leave an environment with this command:

deactivate

On Linux or Mac OS X, use this one:

source deactivate

According to the official Conda documentation, in Windows it is a good practice to deactivate an environment before activating another.

It needs to be mentioned that upon deactivating an environment, the root environment becomes active automatically.

To list out the available environments in a Conda installation, run:

conda env list

Example result:

# conda environments:
#
mynewenv                 D:\Miniconda\envs\mynewenv
tensorflow-cpu           D:\Miniconda\envs\tensorflow-cpu
root                  *  D:\Miniconda

Thanks to this command, you can list out all your environments (the root and all the additional ones). The active environment is marked with an asterisk(at each given moment, there can be only one active environment).

How do you learn the version of your Conda?

It can be useful to check what version of Conda you are using, and also what are the other parameters of your environment. I’m going to show you below how to easily list out this information.

To get the Conda version of the currently active environment, run this command:

conda --version

Example result:

conda 4.3.33

To get a detailed list of information about the environment, for instance:

Conda version,
platform (operating system and bit count — 32- or 64-bit),
Python version,
environment directories,

run this command:

conda info

Example result:

Current conda install:

platform : win-64
          conda version : 4.3.33
       conda is private : False
      conda-env version : 4.3.33
    conda-build version : not installed
         python version : 3.6.3.final.0
       requests version : 2.18.4
       root environment : D:\Miniconda  (writable)
    default environment : D:\Miniconda\envs\tensorflow-cpu
       envs directories : D:\Miniconda\envs
                          C:\Users\sg\AppData\Local\conda\conda\envs
                          C:\Users\sg\.conda\envs
          package cache : D:\Miniconda\pkgs
                          C:\Users\sg\AppData\Local\conda\conda\pkgs
           channel URLs : https://repo.continuum.io/pkgs/main/win-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/win-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/win-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/win-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : C:\Users\sg\.condarc
             netrc file : None
           offline mode : False
             user-agent : conda/4.3.33 requests/2.18.4 CPython/3.6.3 Windows/10 Windows/10.0.15063    
          administrator : False

Now you know some basic commands for managing your environment. Let’s take a look at managing the packages inside the environment.

Managing packages

Depending on the installer you chose, you’re going to end up with some basic (in case of using Miniconda) or a lot of (in case of using Anaconda) packages to start with. But what happens if you need

a new package or
another version of an already installed package?

Conda — your environment and package management tool — will come to the rescue. Let’s look at this in more detail.

Package channels

Channels are the locations of the repositories (on the illustration I call them storages) where Conda looks for packages. Upon Conda’s installation,Continuum’s (Conda’s developer) channels are set by default, so without any further modification, these are the locations where your Conda will start searching for packages.

Channels exist in a hierarchical order. The channel with the highest priority is the first one that Conda checks, looking for the package you asked for. You can change this order, and also add channels to it (and set their priority as well).

It is a good practice to add a channel to the channel list as the lowest priority item. That way, you can include “special” packages that are not part of the ones that are set by default (~Continuum’s channels). As a result, you’ll end up with all the default packages — without the risk of overwriting them by a lower priority channel — AND that “special” one you need.

This is how channels work

To install a certain package that cannot be found inside these default channels, you can search for that “special” package on this website. Not all packages are available on all platforms (=operating system & bit count, for example 64-bit Windows), however, you can narrow down your search to a specific platform. If you find a channel that contains the package you’re looking for, you can append it to your channel list.

To add a channel (named for instance newchannel) with the lowest priority, run:

conda config --append channels newchannel

To add a channel (named newchannel) with the highest priority, run:

conda config --prepend channels newchannel

It needs to be mentioned that in practice you’ll most likely set channels with the lowest priority. For a beginner, adding a channel with the highest priority is an edge case.

To list out the active channels and their priorities, use the following command:

conda config --get channels

Example result:

--add channels 'conda-forge'   # lowest priority
--add channels 'rdonnelly'
--add channels 'defaults'   # highest priority

There is one more aspect that I’d like to summarize here. If multiple channels contain a package, and one channel contains a newer version than the other one, the channels’ hierarchical order determines which one of these two versions are going to be installed, even if the higher priority channel contains the older version.

The version inside the higher priority channel is going to be installed

Searching, installing and removing packages

To list out all the installed packages in the currently active environment, run:

conda list

The command results in a list of the matching package names, versions, and channels:

# packages in environment at D:\Miniconda:
#
asn1crypto                0.22.0           py36h8e79faa_1  
bleach                    1.5.0                     
ca-certificates           2017.08.26           h94faf87_0

...

wheel                     0.29.0           py36h6ce6cde_1  
win_inet_pton             1.0.1            py36he67d7fd_1  
wincertstore              0.2              py36h7fe50ca_0  
yaml                      0.1.7            vc14hb31d195_1  [vc14]

To search for all the available versions of a certain package, you can use the search command. For instance, to list out all the versions of the seabornpackage (it is a tool for data visualization), run:

conda search -f seaborn

Similarly to the conda listcommand, this one results in a list of the matching package names, versions, and channels:

Fetching package metadata .................
seaborn           0.7.1                    py27_0  conda-forge     
                  0.7.1                    py34_0  conda-forge     
                  0.7.1                    py35_0  conda-forge
...
                  0.8.1            py27hab56d54_0  defaults        
                  0.8.1            py35hc73483e_0  defaults        
                  0.8.1            py36h9b69545_0  defaults

To install a package (for instanceseaborn) that is inside a channel that is on your channel list, run this command (if you don’t specify which version you want, it’ll automatically install the latest available version from the highest priority channel):

conda install seaborn

You can also specify the package’s version:

conda install seaborn=0.7.0

To install a package (for example yaml— that is, btw. a YAML parser and emitter) from a channel (for instance a channel named conda-forge), that isinside a channel that is not on your channel list, run:

conda install -c conda-forge yaml

To update all the installed packages (it only affects the active environment), use this command:

conda update

To update one specific package,for examplethe seaborn package, run:

conda update seaborn

To remove the seaborn package, run:

conda remove seaborn

There is one more aspect of managing packages that I’d like to cover in this article. If you don’t want to deal with compatibility issues (breaking changes) caused by a new version of one of the packages you use, you can prevent that package from updating.As I mentioned above, if you run the conda updatecommand, all of your installed packages are going to be updated, so basically it is about creating an “exception list”. So how can you do this?

Prevent packages from updating (pinning)

Create a file named pinned in the environment’s conda-metadirectory. Add the list of the packages that you don’t want to be updated to the file. So for example, to force the seaborn package to the 0.7.x branch and lock the yamlpackageto the 0.1.7 version, add the following lines to the file named pinned:

seaborn 0.7.*
yaml ==0.1.7

Changing an environment’s Python version

And how can you change the Python version of an environment?

Python is also a package. Why is that relevant for you? Because you’re going to use the same command for replacing the currently installed version of Python with another version that you use when you replace any other package with another version of that same package.

First, you should list out the available Python versions:

conda search -f python

Example result (the list contains the available versions and channels):

Fetching package metadata .................
python   2.7.12     0  conda-forge     
         2.7.12     1  conda-forge     
         2.7.12     2  conda-forge
...
         3.6.3      h3b118a2_4  defaults        
         3.6.4      h6538335_0  defaults        
         3.6.4      h6538335_1  defaults

To replace the current Python version with, for example, 3.4.2, run:

conda install python=3.4.2

To update the Python version to the latest version of its branch (for instance updating the 3.4.2 to the 3.4.5 from the 3.4 branch), run:

conda update python

Adding PIP packages

Towards the beginning of this article, I recommended using Conda as your package and environment manager (and not PIP). And as I mentioned above,PIP packages are also installable into Conda environments.

Therefore, if a package is unavailable through the Conda channels, you can try to install it from the PyPI package index. You can do this by using thepipcommand (this command is made available by the Conda installer by default, so you can apply it in any active environment). For instance if you want to install the lightgbm package (it is a gradient boosting framework), run:

pip install lightgbm

Summary

So let’s wrap this up. I know that it seems quite complicated — and it is, in fact, complicated. However, utilizing environments will save you a lot of trouble.

In this article, I’ve summarized how you can:

choose an appropriate Conda installer for yourself
create additional environments (next to the root environment)
add or replace packages (and I also explain how channels work)
manage your Python version(s)

There are many more aspects in the area of Python environment management, so please let me know what aspects you find most challenging. Also let me know if you have some good practices that I don’t mention here. I’m curious about your workflow, so please feel free to share in the response section below if you have any suggestions!

You can comment this article on Medium.

Using Docker

A little side note based on one of my reader’s question (thanks for bringing this up Vikram Durai!):

If your application

uses a server (for example a database server with preloaded data), AND
you want to distribute this server and its data together with your application and its Python environment to others (for instance to a fellow developer or to a client),

you can “containerize” the whole thing with Docker.

In this case, all these components will be encapsulated in a Docker container:

The application itself,
The Conda environment that can run your application (so a compatible Python version and packages),
The local server or service (for example: a database server and a web server) required to run the application

Some more articles about Docker containers (by Preethi Kasireddy and Alexander Ryabtsev):

A Beginner-Friendly Introduction to Containers, VMs and Docker

What is Docker and How to Use it With Python (Tutorial)

Thanks for reading! 🙏

And thanks to my wife Krisztina Szerovay, who helped me make this article more comprehensible and created the illustrations. If you’re interested in UX design (if you are a developer, you should be :)), check out her UX Knowledge Base Sketches here:

UX Knowledge Base Sketch

The post Why you need Python environments and how to manage them with Conda appeared first on protostar.space.