Search This Blog

Thursday, October 11, 2012

do a grep on a column

Consider the following data
$ cat data.txt
1,fruit,apple red,spherical
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
4,vegetable,peppers yellow,irregular
5,vegetable,peppers red,irregular
6,vegetable,broccoli,irregular and green
7,plant,green spinach,leaves
8,plant,very green spinach,leaves
9,plant,verygreenspinach,leaves
10,seed,green pea,spherical
11,unknown,green,undefined
The problem is to filter the lines where the third field ends in the word green. So the output should be
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
Short answer:- use awk with regular expression support
$ awk -F"," '{if ($3 ~ /\sgreen$/) print $0}' data.txt
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
Long answer:-
Naive use of grep gives a lot of false positives.
$ grep green data.txt
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
6,vegetable,broccoli,irregular and green
7,plant,green spinach,leaves
8,plant,very green spinach,leaves
9,plant,verygreenspinach,leaves
10,seed,green pea,spherical
11,unknown,green,undefined
line6 should not be printed as the word green appears in the 4th column (and not the 3rd).

lines 7,8,10 has the word green in the 3rd field. But they should not be printed since green does not appear at the end.

line9 - the letters green are present in the third field but is not preceded by a space, so should not be printed.

line 11 is most likely a data error. The third field has the word green but is not associated with any fruit, vegetable, plant etc.,


Further, to print all the spherical objects, one can use
$ awk -F"," '{if ($4=="spherical") print $0}' data.txt
1,fruit,apple red,spherical
2,fruit,apple green,spherical
10,seed,green pea,spherical
Here a full match on the 4th field is performed. However, this trick cannot be extended to the present problem as only partially matches on the third field are desired.

How the solution works:-
$ awk -F"," '{if ($3 ~ /\sgreen$/) print $0}' data.txt 
~       tests for a match
/ ... /  delimiters of the regular expression
\s       test for space
$        test for end of field

Tested on Debian Wheezy using
$ awk --version
GNU Awk 4.0.1
Copyright (C) 1989, 1991-2012 Free Software Foundation.

Thursday, August 23, 2012

kill all the open figures in matlab

The following function can be used to kill all the open figures in Matlab

function kill_figures
  % close all open figures
  delete(findall(0,'Type','figure'));
end
This function comes in handy when doing parametric studies. Say, there is a big chunk of code that generates 10 figures per run. In order to study the effect of a parameter for say 5 cases, we end up generating 50 (=5x10) plots. Once the study is done, this function comes in handy to kill all those figure windows.

Related tips:

To get the handles of all the figures, do
figHandles = findall(0, 'Type', 'figure');


Sunday, June 24, 2012

specify rsync location

If rsync fails with the following error,
$ /usr/bin/rsync -prltvzD user@machine.com:~/file.csv . -n
rsync: Command not found.
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [Receiver=3.0.7]
the problem could be due to the non-standard location of rsync on the remote machine. For example, in this case, rsync was installed at /usr/local/bin/rsync on the remote machine. This location can be specified by using the "rsync-path" option. The following command succeeds.
$ /usr/bin/rsync -prltvzD --rsync-path=/usr/local/bin/rsync user@machine.com:~/file.csv . -n
receiving file list ... done

sent 20 bytes  received 58 bytes  3.47 bytes/sec
total size is 11473183  speedup is 147092.09 (DRY RUN)

Sunday, June 10, 2012

eBay should make the winning bid stick for at least an hour

Dear eBay[1],

In the current bidding process, the auction ends at a particular time which gives unnecessary importance to it. This can be evidenced by a lot of inactivity at the beginning and a frenzy of activity at the very end.

Eliminating this "auction-ending-time arbitrage" (AETA from now on) is very simple and easy. Just make each "winning bid" stick for at least an hour. If necessary, extend the "auction ending time" further.

You can be even more creative by providing this chance only to the previous "winning bidders". I will let your strategy team work out the finer details. But the underlying idea is very simple. You need to provide a better way for others to re-participate in the auction.

If you think I can be of further assistance, ping me kamaraju at gmail dot com.


Why is this such a big deal?

To be honest, the current system sucks. It is very easy to be outbid at the very last minute by a tiny amount.  There is not enough time left to reenter the bidding process when it happens. This can be very frustrating for the current winning bidder especially if he/she is leading the race for the last couple of days/hours. These winning bidders deserve a second chance to rebid and reparticipate in the auction.

This is not a big issue for items with a lot of bidders (high liquidity). But for items with a very few bidders (illiquid, niche items), the suggested improvement provides a good price discovery mechanism.

When you have a good price discovery platform, the sellers will be more inclined to use eBay resulting in more revenue. So it is a win-win for both eBay and its customers.

References:
  1. www.ebay.com
PS:- I have arbitrarily chosen the "sticky period" as one hour. But in reality, it can be any reasonable choice.

PPS:- If any one else agrees with me, go to http://pages.ebay.com/help/account/suggestions.html , click on the "Buying and searching" link, fill the form and make this suggestion to eBay.

Friday, May 25, 2012

iceweasel amazon prime flash error

When trying to watch a video on Amazon Prime, I got the following error
Sorry we were unable to stream this video. This is
likely because your flash player needs to be updated.
To fix the error, install the following packages
$dpkg -l \*flash\* \*iceweasel\* \*hal\* | grep ^ii
ii  flashplugin-nonfree                  1:2.8.3                              Adobe Flash Player - browser plugin
ii  flashplugin-nonfree-extrasound       0.0.svn2431-3                        Adobe Flash Player platform support library for Esound and OSS
ii  hal                                  0.5.14-8                             Hardware Abstraction Layer
ii  hal-info                             20091130-1                           Hardware Abstraction Layer - fdi files
ii  iceweasel                            10.0.4esr-2                          Web browser based on Firefox
ii  iceweasel-torbutton                  1.4.5.1-1                            transitional dummy package
ii  libhal-storage1                      0.5.14-8                             Hardware Abstraction Layer - shared library for storage devices
ii  libhal1                              0.5.14-8                             Hardware Abstraction Layer - shared library
ii  libkephal4                           4:4.4.5-9                            API for easier handling of multihead systems
ii  libkephal4abi1                       4:4.7.4-2                            API for easier handling of multihead systems
Note:- Some of the packages might be unnecessary. But they are sufficient.

Make sure that the hal daemon is working.
$ps aux | grep hald
116       5715  0.0  0.4  17108  4584 ?        Ssl  20:15   0:00 /usr/sbin/hald
root      5716  0.0  0.1  13980  1588 ?        Sl   20:15   0:00 hald-runner
root      5751  0.0  0.1   5864  1508 ?        S    20:15   0:00 hald-addon-input: Listening on /dev/input/event0 /dev/input/event2 /dev/input/event1 /dev/input/event6 /dev/input/event5 /dev/input/event3 /dev/input/event4 /dev/input/event10
root      5755  0.0  0.1   5860  1244 ?        S    20:15   0:00 /usr/lib/hal/hald-addon-rfkill-killswitch
root      5759  0.0  0.1   5856  1240 ?        S    20:15   0:00 /usr/lib/hal/hald-addon-generic-backlight
116       5777  0.0  0.1   3816  1176 ?        S    20:15   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      5778  0.0  0.1   5864  1508 ?        S    20:15   0:00 hald-addon-storage: polling /dev/sr0 (every 2 sec)
1000      5826  0.0  0.0   4048   764 pts/2    S+   20:25   0:00 grep hald

Update the flashplugin
$sudo update-flashplugin-nonfree --install --verbose

Close iceweasel window.  Remove the ~/.adobe , ~/.macromedia directories
mv ~/.adobe ~/.adobe_old_deleteafter
mv ~/.macromedia ~/.macromedia_old_deleteafter

Restart iceweasel. The video (and audio) should be working now.

Wednesday, May 23, 2012

awk remove quotes in a column

Consider the input file
$cat input.txt 
"k",1.1
"ka",2.2
"kam",3.3
"kama",4.4
"kamar",5.5
"kamara",6.6
"kamaraj",7.7
"kamaraju",8.8
Convert this to
$cat output.txt 
k|1.1
ka|2.2
kam|3.3
kama|4.4
kamar|5.5
kamara|6.6
kamaraj|7.7
kamaraju|8.8
i.e. remove the quotes in the first column, change the delimiter to '|'.

This can be achieved in awk by defining both '"' and ',' as delimiters.
$cat cmd.sh 
cat input.txt | awk -F "[\",]" -v OFS="|" '{print $2,$4}'
Execute the above script as
$./cmd.sh > output.txt 
Tested on Debian Wheezy using GNU awk 4.0.1 version.

keywords: awk remove quotes from string, get rid of quotes, replace the delimiter character

Sunday, April 08, 2012

who writes Linux

This is a great article my friend Sasikanth shared with me on facebook: who-writes-linux-2012 . It provides statistics on who actually develops the Linux kernel, the rate of development, companies supporting the developers and lot of other interesting tidbits.

It is good to see that most developers are getting paid for doing what they love. Makes it a more sustainable model.

The quote "As the Linux kernel grows, the rate of change is growing with it." just sums it up. This IMHO, is very hard to achieve with a conventional business model.

Sunday, April 01, 2012

take a screenshot on windows 7

To take a snapshot (screenshot) of a window do alt+PRINT and then ctrl-v.

If you do just PRINT, you will get a screenshot of the whole screen.

To take the snapshot as a jpg picture
  • start the paint program Start -> run -> mspaint
  • "Select the window of interest" -> alt+PRINT -> go to mspaint window -> use ctrl-v to paste -> save as .jpg.

Sunday, March 11, 2012

matlab to excel

Q) what is the excel equivalent of matlab's ismember function
A) vlookup

The syntax is vlookup(lookup_value, table_array, col_index_num, [range_look_up])

It can be used to check if a value exists in a column, compare two columns etc.,

The col_index_num starts from 1.
I usually give the 4th argument as false, so I can get an exact match. See help in excel for more details.

ISNA() - similar to matlab's isnan function, is another function that comes in handy with vlookup. It tells whether a number is #N/A or not.

Monday, February 27, 2012

computable document format

If you've got 20 minutes to spare, this is a great presentation to watch http://www.wolfram.com/broadcast/screencasts/announcing_cdf/ . Here Conrad Wolfram explains about a new file format that he calls "computable document format". IMHO, it is an excellent idea. One that could change the way we think about e-documents like pdf files, presentations, research publications etc.,

Monday, February 06, 2012

cygwin mintty vim output is not to a terminal

Inside the cygwin terminal if you do

$ uname -r
1.7.10(0.259/5/3)

$ mintty.exe

The mintty terminal pops up. However, if you run vim in the mintty terminal, there are warnings such as

$ vim junk1.txt
Vim: Warning: Output is not to a terminal
Vim: Warning: Input is not from a terminal

To get rid of these warnings, start mintty with "mintty.exe -". The .exe is optional. It is the '-' that is more important.

Why bother with the mintty terminal in the first place? Because it has a lot more functionality than the default cygwin terminal. For eg., the mintty terminal can be resized, maximized, minimized in both the directions; copy pasting text from/to the terminal is a breeze.

Followers