Julia - Dictionaries and Sets



Many of the functions we have seen so far are working on arrays and tuples. Arrays are just one type of collection, but Julia has other kind of collections too. One such collection is Dictionary object which associates keys with values. That is why it is called an ‘associative collection’.

To understand it better, we can compare it with simple look-up table in which many types of data are organized and provide us the single piece of information such as number, string or symbol called the key. It doesn’t provide us the corresponding data value.

Creating Dictionaries

The syntax for creating a simple dictionary is as follows −

Dict(“key1” => value1, “key2” => value2,,…, “keyn” => valuen)

In the above syntax, key1, key2…keyn are the keys and value1, value2,…valuen are the corresponding values. The operator => is the Pair() function. We can not have two keys with the same name because keys are always unique in dictionaries.

Example

julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100

We can also create dictionaries with the help of comprehension syntax. The example is given below −

Example

julia> first_dict = Dict(string(x) => sind(x) for x = 0:5:360)
Dict{String,Float64} with 73 entries:
 "320" => -0.642788
 "65" => 0.906308
 "155" => 0.422618
 "335" => -0.422618
 "75" => 0.965926
 "50" => 0.766044
 ⋮ => ⋮

Keys

As discussed earlier, dictionaries have unique keys. It means that if we assign a value to a key that already exists, we will not be creating a new one but modifying the existing key. Following are some operations on dictionaries regarding keys −

Searching for a key

We can use haskey() function to check whether the dictionary contains a key or not −

julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100
 
julia> haskey(first_dict, "Z")
true

julia> haskey(first_dict, "A")
false

Searching for a key/value pair

We can use in() function to check whether the dictionary contains a key/value pair or not −

julia> in(("X" => 100), first_dict)
true

julia> in(("X" => 220), first_dict)
false

Add a new key-value

We can add a new key-value in the existing dictionary as follows −

julia> first_dict["R"] = 400
400

julia> first_dict
Dict{String,Int64} with 4 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100
 "R" => 400

Delete a key

We can use delete!() function to delete a key from an existing dictionary −

julia> delete!(first_dict, "R")
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100

Getting all the keys

We can use keys() function to get all the keys from an existing dictionary −

julia> keys(first_dict)
Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
 "Y"
 "Z"
 "X"

Values

Every key in dictionary has a corresponding value. Following are some operations on dictionaries regarding values −

Retrieving all the values

We can use values() function to get all the values from an existing dictionary −

julia> values(first_dict)
Base.ValueIterator for a Dict{String,Int64} with 3 entries. Values:
 110
 220
 100

Dictionaries as iterable objects

We can process each key/value pair to see the dictionaries are actually iterable objects −

for kv in first_dict
         println(kv)
      end
 "Y" => 110
 "Z" => 220
 "X" => 100

Here the kv is a tuple that contains each key/value pair.

Sorting a dictionary

Dictionaries do not store the keys in any particular order hence the output of the dictionary would not be a sorted array. To obtain items in order, we can sort the dictionary −

Example

julia> first_dict = Dict("R" => 100, "S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
Dict{String,Int64} with 6 entries:
 "S" => 220
 "U" => 400
 "T" => 350
 "W" => 670
 "V" => 575
 "R" => 100
julia> for key in sort(collect(keys(first_dict)))
         println("$key => $(first_dict[key])")
         end
R => 100
S => 220
T => 350
U => 400
V => 575
W => 670

We can also use SortedDict data type from the DataStructures.ji Julia package to make sure that the dictionary remains sorted all the times. You can check the example below −

Example

julia> import DataStructures
julia> first_dict = DataStructures.SortedDict("S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
 "S" => 220
 "T" => 350
 "U" => 400
 "V" => 575
 "W" => 670
julia> first_dict["R"] = 100
100
julia> first_dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
 “R” => 100
 “S” => 220
 “T” => 350
 “U” => 400
 “V” => 575
 “W” => 670

Word Counting Example

One of the simple applications of dictionaries is to count how many times each word appears in text. The concept behind this application is that each word is a key-value set and the value of that key is the number of times that particular word appears in that piece of text.

In the following example, we will be counting the words in a file name NLP.txtb(saved on the desktop) −

julia> f = open("C://Users//Leekha//Desktop//NLP.txt")
IOStream()

julia> wordlist = String[]
String[]

julia> for line in eachline(f)
            words = split(line, r"\W")
            map(w -> push!(wordlist, lowercase(w)), words)
         end
 julia> filter!(!isempty, wordlist)
984-element Array{String,1}:
 "natural"
 "language"
 "processing"
 "semantic"
 "analysis"
 "introduction"
 "to"
 "semantic"
 "analysis"
 "the"
 "purpose"
   ……………………
   ……………………
julia> close(f)

We can see from the above output that wordlist is now an array of 984 elements.

We can create a dictionary to store the words and word count −

julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64}()

julia> for word in wordlist
            wordcounts[word]=get(wordcounts, word, 0) + 1
         end

To find out how many times the words appear, we can look up the words in the dictionary as follows −

julia> wordcounts["natural"]
1

julia> wordcounts["processing"]
1

julia> wordcounts["and"]
14

We can also sort the dictionary as follows −

julia> for i in sort(collect(keys(wordcounts)))
         println("$i, $(wordcounts[i])")
      end
1, 2
2, 2
3, 2
4, 2
5, 1
a, 28
about, 3
above, 2
act, 1
affixes, 3
all, 2
also, 5
an, 5
analysis, 15
analyze, 1
analyzed, 1
analyzer, 2
and, 14
answer, 5
antonymies, 1
antonymy, 1
application, 3
are, 11
…
…
…
…

To find the most common words we can use collect() to convert the dictionary to an array of tuples and then sort the array as follows −

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
276-element Array{Pair{String,Int64},1}:
            "the" => 76
             "of" => 47
             "is" => 39
              "a" => 28
          "words" => 23
        "meaning" => 23
       "semantic" => 22
        "lexical" => 21
       "analysis" => 15
            "and" => 14
             "in" => 14
             "be" => 13
             "it" => 13
        "example" => 13
             "or" => 12
           "word" => 12
            "for" => 11
            "are" => 11
        "between" => 11
             "as" => 11
                  ⋮
            "each" => 1
           "river" => 1
         "homonym" => 1
  "classification" => 1
         "analyze" => 1
       "nocturnal" => 1
            "axis" => 1
         "concept" => 1
           "deals" => 1
          "larger" => 1
         "destiny" => 1
            "what" => 1
     "reservation" => 1
"characterization" => 1
          "second" => 1
       "certitude" => 1
            "into" => 1
        "compound" => 1
    "introduction" => 1

We can check the first 10 words as follows −

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:10]
10-element Array{Pair{String,Int64},1}:
      "the" => 76
       "of" => 47
       "is" => 39
        "a" => 28
    "words" => 23
  "meaning" => 23
 "semantic" => 22
  "lexical" => 21
 "analysis" => 15
      "and" => 14

We can use filter() function to find all the words that start with a particular alphabet (say ’n’).

julia> filter(tuple -> startswith(first(tuple), "n") && last(tuple) < 4, collect(wordcounts))
6-element Array{Pair{String,Int64},1}:
      "none" => 2
       "not" => 3
    "namely" => 1
      "name" => 1
   "natural" => 1
 "nocturnal" => 1

Sets

Like an array or dictionary, a set may be defined as a collection of unique elements. Following are the differences between sets and other kind of collections −

  • In a set, we can have only one of each element.

  • The order of element is not important in a set.

Creating a Set

With the help of Set constructor function, we can create a set as follows −

julia> var_color = Set()
Set{Any}()

We can also specify the types of set as follows −

julia> num_primes = Set{Int64}()
Set{Int64}()

We can also create and fill the set as follows −

julia> var_color = Set{String}(["red","green","blue"])
Set{String} with 3 elements:
 "blue"
 "green"
 "red"

Alternatively we can also use push!() function, as arrays, to add elements in sets as follows −

julia> push!(var_color, "black")
Set{String} with 4 elements:
 "blue"
 "green"
 "black"
 "red"

We can use in() function to check what is in the set −

julia> in("red", var_color)
true

julia> in("yellow", var_color)
false

Standard operations

Union, intersection, and difference are some standard operations we can do with sets. The corresponding functions for these operations are union(), intersect() and setdiff().

Union

In general, the union (set) operation returns the combined results of the two statements.

Example

julia> color_rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set{String} with 7 elements:
 "indigo"
 "yellow"
 "orange"
 "blue"
 "violet"
 "green"
 "red"
 
julia> union(var_color, color_rainbow)
Set{String} with 8 elements:
 "indigo"
 "yellow"
 "orange"
 "blue"
 "violet"
 "green"
 "black"
 "red"

Intersection

In general, an intersection operation takes two or more variables as inputs and returns the intersection between them.

Example

julia> intersect(var_color, color_rainbow)
Set{String} with 3 elements:
 "blue"
 "green"
 "red"

Difference

In general, the difference operation takes two or more variables as an input. Then, it returns the value of the first set excluding the value overlapped by the second set.

Example

julia> setdiff(var_color, color_rainbow)
Set{String} with 1 element:
 "black"

Some Functions on Dictionary

In the below example, you will see that the functions that work on arrays as well as sets also works on collections like dictionaries −

julia> dict1 = Dict(100=>"X", 220 => "Y")
Dict{Int64,String} with 2 entries:
 100 => "X"
 220 => "Y"
 
julia> dict2 = Dict(220 => "Y", 300 => "Z", 450 => "W")
Dict{Int64,String} with 3 entries:
 450 => "W"
 220 => "Y"
 300 => "Z"

Union

julia> union(dict1, dict2)
4-element Array{Pair{Int64,String},1}:
 100 => "X"
 220 => "Y"
 450 => "W"
 300 => "Z"

Intersect

julia> intersect(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
 220 => "Y"

Difference

julia> setdiff(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
 100 => "X"

Merging two dictionaries

julia> merge(dict1, dict2)
Dict{Int64,String} with 4 entries:
 100 => "X"
 450 => "W"
 220 => "Y"
 300 => "Z"

Finding the smallest element

julia> dict1
Dict{Int64,String} with 2 entries:
 100 => "X"
 220 => "Y"
 
 
julia> findmin(dict1)
("X", 100)
Advertisements