Back in December 2013 I posted about using the field calculator in ArcMap to calc a field to the third word from another field. Since then, the comments section has been pretty active with questions and answers. I thought it would be a great time to expand on that with other nifty field calculator tricks that I have used over the years.
Split Up Data
This one I use a lot. I have a field that contains parcel number data. In LA County, parcel numbers are 10 digit numbers that represent the assessor book (first 4 numbers), page (next 3 numbers), and lot number (last 3 numbers). I want to split them out into their own fields. Here is how I do it.
Below you can see the sample parcel numbers I will work with. I am using ArcGIS Desktop version 10.4.1.
Note that my parcel number field AIN is a text field. Visually you can confirm this because the values are pushed to the left (left justified). If you look at the OBJECTID field, note the values are pushed to the right (right justified), which is an indicator of a numeric or integer field. Of course I could have looked at the table properties too to verify the field data types!
Anyway, since my parcel number field is a text field, so will be my Book, Page, and Lot fields. This is actually important with respect to the python code I will use and also keeping the number value from changing, for example I do not want the value 001 changing to 1 because that is what would happen with integer fields.
So let’s calc the Book field. Here I want the first 4 characters from the AIN field. I open the table in ArcMap, right click on the Book field, and select Field Calculator.
When the Field Calculator window appears, the first thing I do is change the Parser to Python. The next thing I do is build my python expression to extract the first 4 characters from the AIN field.
Note that in the expression the AIN field is inside the string function str() to keep the value a string (text) then [:4] tells it to “return the first 4 characters from the beginning”.
Click the OK button and it is done!
Now right click on the Page field and bring up the Field Calculator again. This time we want to extract the next 3 characters for the page values.
Here we are slicing out the characters starting at index 4 and before index 7. Confused? This might help:
index: 0123456789 parcel: BBBBPPPLLL
Book = BBBB Page = PPP Lot = LLL
Python uses index positions for each character, starting at zero. Our parcel numbers are 10 characters with index positions 0 to 9. To slice out the page characters (PPP), they fall in index positions 4 to 6, so we use [4:7] to say “extract the characters from index 4 up to but not including index 7”. That’s just the way the “slicing” syntax works. Got it? Click the OK button and it is done.
For Lot, we setup the Field Calculator expression below:
Here we are extracting the last three characters. Basically [-3:] says “return the last 3 characters from the end”. Clicking the OK button gives us the lot numbers.
Removing Extra Spaces
When you have free form data input, it is inevitable that you will get extra spaces entered by mistake. They could happen between words, before the words, or even after all the words. Have no fear, we can remove the extra spaces using Python’s join() and split() functions.
Below is an example table with extra spaces in some data in the With_Spaces field.
The first record has two, three, and four extra spaces between words, while record 2 has spaces before the words and record 3 has spaces after the words. Just so you can see the spaces in record 3, I will highlight the contents in the With_Spaces field:
Ok, let’s get rid of those spaces. Right click on the Fixed field and bring up the Field Calculator to enter the following expression:
How will this work? The split() function, without any parameters here, will take the contents of the With_Spaces field and using any “white space” between words as the delimiter for each word, splits out the words. Any leading or trailing “white space” is ignored. Next, with the words all split out, the join() function puts all the words together, separated by a single space ” “. Still confused? Maybe showing this in Python will help.
You can bring up the Python window by clicking the Python button at the top of ArcMap.
Here is an example of what happens when you use split():
A variable var is set to a string with a bunch of words with many spaces. Pretend the var variable is our field name. It then is used in the split() function. See how each word is split out? It actually created a Python list. It is this list that the join() function will use to put all the words together. For example:
Here the single space ” ” is sent to the join() function to use as a separator, thus a single space between each word, which is what we want.
Now that you know how this works, clicking the OK button in the Field Calculator will do the job we need.
There are times when data is entered in all uppercase, all lowercase, or even a mix. In this example I want to take the names stored in a field and change them to proper case, basically the first character in each word is capitalized and the rest are lowercase.
Usually I would have a second field to calc new values to just in case of a mistake, but this time I will live dangerously and replace the values in the same field. Bringing up the Field Calculator, I want to use the Python functions split(), capitalize(), and join() to change all values to proper case.
There is a Python “for loop” here! What is going on now? Like before we will be using the split() function to create a list of words, the names from the Name field, then using the for loop to step through each word in the list, assign the variable i to the word and run it through the capitalize() function which will make the first character a capital letter and the rest lowercase. Then after the word list is processed by the for loop, they are passed to the join() function and just like before takes all the words in the list and adds a space ” ” between them for the new value that will be calced back to the Name field. Clear as mud? Press that OK button to see the results.
Pretty cool. You are now a Python expression ninja! I hope these tricks help you clean up your data too. -mike