I am learning R

Ever since I got my new job last year as a data analyst (mostly social media data), I have been learning about data manipulation rapidly. I had a very rough handover with the previous analyst she did a very inefficient spreadsheet management and creating a data silo (copy>paste>flat table, ugh). Only after 4 months I finally learned power query and finally could make the process semi automated, and it is very easy to manage my data since then.

However, as I read job vacancies, I saw that most of data analyst (And data science) jobs require you to have a knowledge on R/Python and SQL. I recently bought an R course on udemy and hoping at some point I could use it exclusively to manipulate my data instead of using excel. But heck, however cool R is, there are just some things that are faster in excel, say renaming columns. I can just open a file, press F2, and rename the column that I want to rename. In R, my I must write a bunch of code, copy paste the original column name, rise and repeat, and hope I don't forget anything. I guess it's kinda true that, R should be used for something repetitive and excel can be used for quick one-time data manipulation. I am hoping in the next 3 months I can use R to replace all my excel's data manipulation need, and will use excel minimally, while learning SQL.

After that? I probably will learn python too. I am not really sure what's the point of learning R and Python, but since it seems the requirements of data-related jobs, I guess it doesn't hurt to learn it. I actually want to learn tableu as well, unfortunately it's expensive, so I stick with power BI for now.

any resources you may want to share?

Comments

R is more of a stats focused language while python is used across computing*. If you have to interface with someone else then knowing python often helps there.

*python can do everything, just badly. Also python that holds computing together.

"say renaming columns"
It is at this point you should realise that you should probably have been using a database several steps back. It is not that renaming a column or table is particularly quicker or easier than R or whatever (it is https://www.thoughtco.com/change-column-name-in-mysql-2693874 -- seconds in phpmyadmin if you wanted) but that it is such a limiting way of approaching it all -- https://www.w3schools.com/sql/ (w3schools is not great, better than it used to be but still not great, but that will hopefully give you a gentle intro to it all).

Resources wise. If your predecessor is as... competent as you make it appear then I would also question how good their other computer peeps are. You need not become a full dbadmin or, worse, sysadmin but making sure to obey all the data handling rules for your work (basically don't stick it on a USB drive and wander home with it if you are not allowed) then maybe keep your own backups -- if you wander in one morning and your IT people are ghostly pale then it is quite amusing to be able to walk up like your testicles are big enough to need their own wheelbarrow and drop a USB drive with last night's data on it straight on the table (or samples of the last few months if you are doing auditing). Probably also want to make sure they have a test server ( https://xkcd.com/327/ ) for this too.

If you wanted to get really out there then you could possibly consider also learning matlab. I absolutely detest it myself but I do recognise its power.

"and then learning SQL"
...
It is not the worst thing to not be learning it at the same time as R (not like say learning modern HTML and learning CSS after the fact) but I would not be surprised to hear you come back and say I kind of wish I had started it a bit earlier.

Finally if you are anything like me then while theory is nice some practical applications of it all are also good. https://towardsdatascience.com/predicting-hit-video-games-with-ml-1341bd9b86b0 and that site's categeory on games has been fascinating for me since I found it a few weeks back.
 
  • Like
Reactions: 1 person
If how you do it is your choice (and it sounds like it is) I really recommend to learn python instead of R.
Sure, it won't hurt to learn both if you have time to kill, but python is so much more versatile.
R is only for data stuff, python is for everything.

The modules you'll use are pandas, numpy, openpyxl/xlrd, matplotlib (+ seaborn), and you can do some really cool things!
I forget the name of the sql module I use, but there are a few of them out there, I just use one.
It's really nice to write an sql query, get that data into pandas, and do all of your processing there.
Loading excel files etc is a piece of cake.
Excel gets so slow and can't really handle when there are too many rows, but in python it's a champ and the large datasets I've been working with (admittedly not 'big data') haven't been a problem.

Renaming columns is easy! :)

If you get stuck you just google, loads of stuff is on stackoverflow, most things have been asked before.
There are also loads of personal sites guys have made with information, this is a good one for example - https://chrisalbon.com/#python
If you need any help or advice just let me know, I do some of this stuff as my job.
 
  • Like
Reactions: 1 person
Thank you for the tips folks! Once I am comfortable with both r and SQL, I will learn python! I am currently learning via udemy, i am glad some people made the introduction quite easy.
 
  • Like
Reactions: 1 person
If you're learning python, I _highly_ recommend you to look into sqlalchemy as well. It's basically the best ORM that python has. Makes mapping models to tables really easy.
 
  • Like
Reactions: 1 person

Blog entry information

Author
eriols
Views
439
Comments
14
Last update

More entries in Personal Blogs

More entries from eriols